Robert Scoble's Rules for Successfully Scaling Startups

Robert Scoble in an often poignant FriendFeed thread commiserating PodTech's unfortunate end, shared what he learned about creating a successful startup. Here's a summary of a Robert's rules and why Machiavelli just may agree with them:

  • Have a story.
  • Have everyone on board with that story.
  • If anyone goes off of that story, make sure they get on board immediately or fire them.
  • Make sure people are judged by the revenues they bring in. Those that bring in revenues should get to run the place. People who don't bring in revenues should get fewer and fewer responsibilities, not more and more.
  • Work ONLY for a leader who will make the tough decisions.
  • Build a place where excellence is expected, allowed, and is enabled.
  • Fire idiots quickly.
  • If your engineering team can't give a media team good measurements, the entire company is in trouble. Only things that are measured ever get improved.
  • When your stars aren't listened to the company is in trouble.
  • Getting rid of the CEO, even if it's all his fault, won't help unless you replace him/her with someone who is visionary and who can fix the other problems. An excellent list that meshes with much of my experience, which is why I thought it worth sharing :-) My take-away from Robert's rules can be summarized in one word: focus. Focus is the often over looked glue binding groups together so they can achieve great things... When Robert says have "a story" that to me is because a story provides a sort of "decision box" giving a group its organizing principle for determining everything that comes later. Have a decision? Look at your story for guidance. Have a problem? Look at your story for guidance. Following your stories' guidance is another matter completely. Without a management strong enough act in accordance with the story, centripetal forces tear an organization apart. It takes a lot of will to keep all the forces contained inside the box. Which is why I think Robert demands a "focused" leadership. Machiavelli calls this idea "virtue." Princes must exhibit virtue if they are to keep their land. Machiavelli doesn't mean virtue in the modern sense of be good and eat your peas, but in the ancient sense of manliness (sorry ladies, this was long ago). Virtue shares the same root as virility. So to act virtuously is to be bold, to act, to take risks, be aggressive, and make the hard unpopular decisions. For Machiavelli that's the only way to reach your goals in accordance with how the world really works, not how it ought to work. Any Prince who acts otherwise will lose their realm. Firing people (yes, I've been fired) not contributing to your story is by Machiavelli's definition a virtuous act. It is a messy ugly business nobody likes doing. It requires admitting a mistake was made, a lot of paperwork, and looking like the bad guy. And that's why people are often not fired as Robert suggests. The easy way is to just ignore the problem, but that's not being virtuous. Addition by subtraction is such a powerful force precisely because it maintains group focus on excellence and purpose. It's a statement that the story really matters. Keeping people who aren't helping is a vampire on a group's energy. It slowly drains away all vivacity until only a pale corpse remains. Robert's rules may seem excessively ruthless and cruel to many. Decidedly unmodern. But in true Machiavellian fashion let's ask what is preferable: a strong secure long-lived state ruled by virtue or a state ruled according to how the world ought to work that is constantly at the mercy of every invader? If you are still hungry for more starter advice, Gordon Ramsay has some unintentionally delicious thoughts on developing software as well. Serve yourself at: Gordon Ramsay On Software/04.html#a225"> and . Gordon Ramsay's Lessons for Software Take Two. Kevin Burton share's seven deadly sins startups should avoid and makes an inspiring case how his company stronger and better able to compete by not taking VC funds. Really interesting. Diary of a Failed Startup says solve a problem, not a platform to solve a class of problems. Truer words were never spoken.

    Click to read more ...

  • Wednesday

    The Mother of All Database Normalization Debates on Coding Horror

    Jeff Atwood started a barn burner of a conversation in Maybe Normalizing Isn't Normal on how to create a fast scalable tagging system. Jeff eventually asks that terrible question: which is better -- a normalized database, or a denormalized database? And all hell breaks loose. I know, it's hard to imagine database debates becoming contentious, but it does happen :-) It's lucky developers don't have temporal power or rivers of blood would flow. Here are a few of the pithier points (summarized):

  • Normalization is not magical fairy dust you sprinkle over your database to cure all ills; it often creates as many problems as it solves. (Jeff)
  • Normalize until it hurts, denormalize until it works. (Jeff)
  • Use materialized views which are tables created and maintained by your RDBMS. So a materialized view will act exactly like a de-normalized table would - except you keep you original normalized structure and any change to original data will propagate to the view automatically. (Goran)
  • According to Codd and Date table names should be singular, but what did they know. (Pablo, LOL)
  • Denormalization is something that should only be attempted as an optimization when EVERYTHING else has failed. Denormalization brings with it it's own set of problems. You have to deal with the increased set of writes to the system (which increases your I/O costs), you have to make changes in multiple places when data changes (which means either taking giant locks - ugh or accepting that there might be temporary or permanent data integrity issues) and so on. (Dare Obasanjo)
  • What happens, is that people see "Normalisation = Slow", that makes them assume that normalisation isn't needed. "My data retrieval needs to be fast, therefore I am not going to normalise!" (Tubs)
  • You can read fast and store slow or you can store fast and read slow. The biggest performance killer is so called physical read. Finding and accessing data on disk is the slowest operation. Unless child table is clustered indexed and you're using the cluster index in the join you will be making lots of small random access reads on the disk to find and access the child table data. This will be slow. (Goran)
  • The biggest scalability problems I face are with human processes, not computer processes. (John)
  • Don't forget that the fastest database query is the one that doesn't happen, i.e. caching is your friend. (Chris)
  • Normalization is about design, denormalization is about optimization. (Peter Becker)
  • You're just another knucklehead. (BuggyFunBunny)
  • Lets unroll our loops next. RDBMS is about shared *transactional data*. If you really don't care about keeping the data right all the time, then how you store it doesn't matter.(Christog)
  • Jeff, are you awake? (wiggle)
  • Denormalization may be all well and good, when you need the performance and your system is STABLE enough to support it. Doing this in a business environment is a recipe for disaster, ask anyone who has spent weeks digging through thousands of lines of legacy code, making sure to support the new and absolutely required affiliation_4. Then do the whole thing over again 3 months later when some crazy customer has five affiliations. (Sean)
  • Do you sex a cat, or do you gender it? (Simon)
  • This is why this article is wrong, Jeff. This is why you're an idiot, in case the first statement wasn't clear enough. You just gave an excuse to be lazy to someone who doesn't have a clue. (Marcel)
  • This is precisely why you never optimize until *after* you profile (find objectively where the bottlenecks are). (TED)
  • Another great way to speed things up is to do more processing outside of the database. Instead of doing 6 joins in the database, have a good caching plan and do some simple joining in your application. (superjason)
  • Lastly - No one seems to have mentioned that a decently normalized db greatly reduces application refactoring time. As new requirements come along, you don't have have to keep pulling stuff apart and putting back in new configurations of the db, (Ed)
  • Keep a de-normalized replica for expensive operations (e.g. reports), Cache Results for repeat queries (Memcache), Partition the database for scalability (vertical or horizontal) (Gareth)
  • Speaking from long experience, if you don't normalize, you will have duplicates. If you don't have data constraints, you will have invalid data. If you don't have database relational integrity, you will have orphan "child" records, etc. Everybody says "we rely on the application to maintain that", and it never, never does. (A. Lloyd Flanagan)
  • I don't think you can make any blanket statements on normal vs. non-normal form. Like everything else in programming, it all depends on requirements and intended goals. (Wayne)
  • De-normalization is for reporting, not for OLTP. (Eric)
  • Your six-way join is only needed because you used surrogate instead of natural keys. When you use natural keys, you will discover you need much fewer joins because often data you need is already present as foreign keys. (Leandro)
  • What I think is funny is the number of people who think that because they use LINQ or Hibernate they aren't affected by these issues. (Sam)
  • You miss the point of normalization entirely. Normalization is about optimizing large numbers of small CrUD operations, and retrieving small sets of data in order to support those crud operations. Think about people modifying their profiles, recording cash registers operations, recording bank deposits and withdrawals. Denormalization is about optimizing retrieval of large sets of data. Choosing an efficient database design is about understanding which of those operations is more important. (RevMike)
  • Multiple queries will hurt performance much less than the multi-join monstrosity above that will return indistinct and useless data. (Chris)
  • Cache the generated view pages first. Then cache the data. You have to think about your content- very infrequently will anyone be updating it, it's all inserts. So you don't have to worry about normalization too much. (Matt)
  • I wonder if one factor at play here is that it's very easy to write queries for de-normalized data, but it's relatively hard to write a query that scales well to a large data set. (Thomi)
  • Denormalization is the last possible step to take, yet the first to be suggested by fools. (Jeremy)
  • There is a simple alternative to denormalisation here -- to ensure that the values for a particular user_id are physically clustered. (David)
  • The whole issue is pretty simple 99% of the time - normalized databases are write optimized by nature. Since writing is slower then reading and most database are for CRUD then normalizing makes sense. Unless you are doing *a lot* more reading then writing. Then if all else fails (indexes, etc.) then create de-normalized tables (in addition to the normalized ones). (Rob)
  • Don't fear normalization. Embrace it. (Charles)
  • I read on and discovered all the loons and morons who think they know a lot more then they do about databases. (Paul)
  • Put all the indexable stuff into the users table, including zipcode, email_address, even mobile_phone -- hey, this is the future!; Put the rest of the info into a TEXT variable, like "extra_info", in JSON format. This could be educational history, or anything else that you would never search by; If you have specific applications (think facebook), create separate tables for them and join them onto the user table whenever you need. (Greg)
  • How is the data being used? Rapid inserts like Twitter? New user registration? Heavy reporting? How one stores data vs. how one uses data vs. how one collects data vs. how timely must new data be visible to the world vs. should be put OLTP data into a OLAP cube each night? etc. are all factors that matter. (Steve)
  • It might be possible to overdo it, but trust me, I have had 20 times the problems with denormalized data than with normalized. (PRMAN)
  • Your LOGICAL model should *always* be fully normalized. After all, it is the engine that you derive everything else from. Your PHYSICAL model may be denormalized or use system specific tools (materialized views, cache tables, etc) to improve performance, but such things should be done *after* the application level solutions are exhausted (page caching, data caches, etc.) (Jonn)
  • For very large scale applications I have found that application partitioning can go a long way to giving levels of scalability that monolithic systems fail to provide: each partition is specialized for the function it provides and can be optimized heavily, and when you need the parts to co-operate you bind the partitions together in higher level code. (John)
  • People don't care what you put into a database. They care what you can get out of it. They want freedom and flexibility to grow their business beyond "3 affiliations". (PRMan)
  • People don't care what you put into a database. They care what you can get out of it. They want freedom and flexibility to grow their business beyond "3 affiliations". (Steve)
  • I normalise, then have distinct (conceptually transient) denormalised cache tables which are hammered by the front-end. I let my model deal with the nitty-gritty of the fix-ups where appropriate (handy beginUpdate/endUpdate-type methods mean the denormalised views don't get rebuilt more than necessary). (Mo)
  • Stop playing with mySQL. (Jonathan)
  • What a horrible, cowboy attitude to DB design. I hope you don't design any real databases. (Dave)
  • IOW, scalability is not a problem, until it is. Strip away the scatalogical reference, and all you have is a boring truism. (Yawn)
  • Is my Site OLTP? If the answer is yes then Normalize. Is my site OLAP? If the answer is yes then De-Normalize! (WeAreJimbo)
  • This is a dangerous article, or perhaps you just haven't seen the number of horrific "denormalised" databases I have. People use these posts as excuses to build some truly horrific crimes against sanity. (AbGenFac)
  • Be careful not to confuse a denormalised database with a non-normalised database. The former exists because a previously normailsed database needed to be 'optimised' in some way. The latter exists because it was 'designed' that way from scratch. The difference is subtle, but important. (Bob) OK, more than a few quotes. There's certainly no lack of passion on the issue! One thing I would add is to organize your application around an application level service layer rather than allowing applications to access the database directly at a low level. Amazon is a good example of this approach. Many of the denormalization comments have to do with the problems of data inconsistency, which is of course true because that's why normalization exists. Many of these problems can be reduced if there's a single service access point over which to get data. It's when data access is spread throughout an application that we see serious problems.

    Related Articles

  • Denormalization Patterns by Kenneth Downs
  • When Databases Lie: Consistency vs. Availability in Distributed Systems by Dare Obasanjo
  • Stored procedure reporting & scalability by Jason Young
  • When Not to Normalize your SQL Database by Dare Obasanjo

    Click to read more ...

  • Tuesday

    ZooKeeper - A Reliable, Scalable Distributed Coordination System 

    ZooKeeper is a high available and reliable coordination system. Distributed applications use ZooKeeper to store and mediate updates key configuration information. ZooKeeper can be used for leader election, group membership, and configuration maintenance. In addition ZooKeeper can be used for event notification, locking, and as a priority queue mechanism. It's a sort of central nervous system for distributed systems where the role of the brain is played by the coordination service, axons are the network, processes are the monitored and controlled body parts, and events are the hormones and neurotransmitters used for messaging. Every complex distributed application needs a coordination and orchestration system of some sort, so the ZooKeeper folks at Yahoo decide to build a good one and open source it for everyone to use. The target market for ZooKeeper are multi-host, multi-process C and Java based systems that operate in a data center. ZooKeeper works using distributed processes to coordinate with each other through a shared hierarchical name space that is modeled after a file system. Data is kept in memory and is backed up to a log for reliability. By using memory ZooKeeper is very fast and can handle the high loads typically seen in chatty coordination protocols across huge numbers of processes. Using a memory based system also mean you are limited to the amount of data that can fit in memory, so it's not useful as a general data store. It's meant to store small bits of configuration information rather than large blobs. Replication is used for scalability and reliability which means it prefers applications that are heavily read based. Typical of hierarchical systems you can add nodes at any point of a tree, get a list of entries in a tree, get the value associated with an entry, and get notification of when an entry changes or goes away. Using these primitives and a little elbow grease you can construct the higher level services mentioned above. Why would you ever need a distribute coordination system? It sounds kind of weird. That's more the question I'll be addressing in this post rather than how it works because the slides and the video do a decent job explaining at a high level what ZooKeeper can do. The low level details could use another paper however. Reportedly it uses a version of the famous Paxos Algorithm to keep replicas consistent in the face of the failures most daunting. What's really missing is a motivation showing how you can use a coordination service in your own system and that's what I hope to provide... Kevin Burton wants to use ZooKeeper to to configure external monitoring systems like Munin and Ganglia for his Spinn3r blog indexing web service. He proposes each service register its presence in a cluster with ZooKeeper under the tree "/services/www." A Munin configuration program will add a ZooKeeper Watch on that node so it will be notified when the list of services under /services/www changes. When the Munin configuration program is notified of a change it reads the service list and automatically regenerates a munin.conf file for the service. Why not simply use a database? Because of the guarantees ZooKeeper makes about its service:

  • Watches are ordered with respect to other events, other watches, and asynchronous replies. The ZooKeeper client libraries ensures that everything is dispatched in order.
  • A client will see a watch event for a znode it is watching before seeing the new data that corresponds to that znode.
  • The order of watch events from ZooKeeper corresponds to the order of the updates as seen by the ZooKeeper service. You can't get these guarantees from an event system plopped on top of a database and these are the sort of guarantees you need in a complex distributed system where connections drop, nodes, fail, retransmits happen, and chaos rules the day. What rules the night is too terrible to contemplate. For example, it's important that a service-up event is seen after the service-down or you may unnecessarily drop revenue producing work because of an event out-of-order issue. Not that I would know anything about this mind you :-) A weakness of ZooKeeper is the fact that changes happened are dropped: Because watches are one time triggers and there is latency between getting the event and sending a new request to get a watch you cannot reliably see every change that happens to a node in ZooKeeper. Be prepared to handle the case where the znode changes multiple times between getting the event and setting the watch again. (You may not care, but at least realize it may happen.) This means that ZooKeeper is a state based system more than an event system. Watches are set as a side-effect of getting data so you'll always have a valid initial state and on any subsequent change events you'll refresh to get new values. If you want to use events to log when and how something changed, for example, then you can't do that. You would have to include change history in the data itself. Let's take a look at another example of where ZooKeeper could be useful. Picture a complex backend system running on, let's say, a 100 nodes (maybe a lot less, maybe a lot more) in a data center. For example purposes assume the system is an ad system for serving advertisements to web sites. Ad systems are complex beasts that require a fair bit of coordination. Imagine all the subsystems needing to run on those 100 nodes: database, monitoring, fraud detectors, beacon servers, web server event log processors, failover servers, customer dashboards, targeting engines, campaign planners, campaign scenario testers, upgrades, installs, media managers, and so on. There's a lot going on. Now imagine the power in the data center flips and all the machines power on. How do all the processes across all the hosts know what to do? Now imagine everything is up and a few machines go down. How do all the processes know what to do in this situation? This is where a coordination service comes in. A coordination service acts as the backplane over which all these subsystems figure out what they should do relative to all the other subsystems in a product. How, for example, do the ad servers know which database to use? Clearly there are many options for solving this problem. Using standard DNS naming conventions is one. Configuration files is another. Hard coding is still a favorite. Using a bootstrap service locator service is yet another (assuming you can bootstrap the bootstrap service). Ideally any solution must work just as well during unit testing, system testing, and deployment. In this scenario ZooKeeper acts as the service locator. Each process goes to ZooKeeper and finds out which is the primary database. If a new primary is elected, say because a host fails, then ZooKeeper sends an event that allows everyone dependent on the database to react by getting the new primary database. Having complicated retry logic in application code to fail over to another database server is simply a disaster as every programmer will mess it up in their own way. Using a coordination service nicely deals with the problem of services locating other services in all scenarios. Of course, using a proxy like MySQL Proxy would remove even more application level complexity in dealing with failover and request routing. How did the database servers decide which role they'll would play in the first place? All the database servers boot up and say "I'm a database server brave and strong, what's my role in life? Am I a primary or a secondary server? Or if I'm a shard what key range do I serve?" If 10 servers are database servers negotiating roles can be a very complicated and error prone process. A declarative approach through configuration files that specify a failover ring in configuration files is a good approach, but it's a pain to get to work in local development and test environments as the machines are always changing. It's easier to let the database servers come up and self organize themselves on initial role election and in failure scenarios. The advantage of this system is that it can run locally on one machine or on a dozen machines in the data center with very little effort. ZooKeeper supports this type of coordination behavior. Now let's say I want to change the configuration of ad targeting state machines currently running in 40 processes on 40 different hosts. How do I do that? The first approach is no approach. Most systems make it so a new code release has to happen, which is very sloooow. Another approach is a configuration file. Configuration is put in a distribution package and pushed to all nodes. Each process then periodically checks to see if the configuration file has changed and if it has then the new configuration read. That's the basics. Infinite variations can follow. You can have configuration for different subsystems. There's complexity because you have to know what packages are running on which nodes. You have to deal with rollback in case all packages don't push correctly. You have to change the configuration, make a package, test it, then push it to the data center operations team which may take a while to perform the upgrade. It's a slow process. And when you get into changes that impact multiple subsystems it gets even more complicated. Another approach I've taken is to embed a web server in each process so you can see the metrics and change the configuration for each process on the fly. While powerful for a single process it's harder to manipulate sets of processes across a data center using this approach. Using ZooKeeper I can store my state machine definition as a node which is loaded from the static configuration collected from every distribution package in a product. Every process dependent on that node can register as a watcher when they initially read the state machine. When the state machine is updated all dependent entities will get an event that causes them reload the state machine into the process. Simple and straightforward. All processes will eventually get the change and any rebooting processes will pick up the new state machine on initialization. A very cool way to reliably and centrally control a large distributed application. One caveat is I don't see much activity on the ZooKeeper forum. And the questions that do get asked largely go unanswered. Not a good sign when considering such a key piece of infrastructure. Another caveat that may not be obvious on first reading is that your application state machine using ZooKeeper will have to be intimately tied to ZooKeeper's state machine. When a ZooKeeper server dies, for example, your application must process that event and reestablish all your watches on a new server. When a watch event comes your application must handle the event and set new watches. The algorithms to carry out higher level operations like locks and queues are driven by multi-step state machines that must be correctly managed by your application. And as ZooKeeper deals with state that is probably stored in your application it's important to worry about thread safety. Callbacks from the ZooKeeper thread could access shared data structures. An Actor model where you dump ZooKeeper events into your own Actor queues could be a very useful application architecture here for synthesizing different state machines in a thread safe manner.

    Some Fast Facts

  • How data are partitioned across multiple machines? Complete replication in memory. (yes this is limiting)
  • How does update happen (interaction across machines)? All updates flow through the master and are considered complete when a quorum confirms the update.
  • How does read happen (is getting a stale copy possible) ? Reads go to any member of the cluster. Yes, stale copies can be returned. Typically, these are very fresh, however.
  • What is the responsibility of a leader? To assign serial id's to all updates and confirm that a quorum has received the update.
  • There are several limitations that stand out in this architecture: - complete replication limits the total size of data that can be managed using Zookeeper. This is acceptable in some applications, not acceptable in others. In the original domain of Zookeeper (management of configuration and status), it isn't an issue, but zookeeper is good enough to encourage creative misuse where this can become a problem. - serializing all updates through a single leader can be a performance bottleneck. On the other hand, it is possible to push 50K updates per second through a leader and the associated quorum so this limit is pretty high. - the data storage model is completely non relational. These answers were provided by Ted Dunning on the Cloud Computing group.

    Related Articles

  • An Introduction to ZooKeeper Video (Hadoop and Distributed Computing at Yahoo!) (PDF)
  • ZooKeeper Home, Email List, and Recipes (which has some odd conotations for a Zoo).
  • The Chubby Lock Service for Loosely-Coupled Distributed Systems from Google
  • Paxos Made Live – An Engineering Perspective by Tushar Chandra, Robert Griesemer, and Joshua Redstone from Google.
  • Updates on Open Source Distributed Consensus by Kevin Burton
  • Using ZooKeeper to configure External Monitoring Systems by Kevin Burton
  • Paxos Made Simple by Leslie Lamport
  • Hyperspace - API description of Hyperspace, a Chubby-like service
  • Notes on ZooKeeper at the Hadoop Summit by James Hamilton.

    Click to read more ...

  • Thursday

    Can cloud computing smite down evil zombie botnet armies?

    In the more cool stuff I've never heard of before department is something called Self Cleansing Intrusion Tolerance (SCIT). Botnets are created when vulnerable computers live long enough to become infected with the will to do the evil bidding of their evil masters. Security is almost always about removing vulnerabilities (a process which to outside observers often looks like a dog chasing its tail). SCIT takes a different approach, it works on the availability angle. Something I never thought of before, but which makes a great deal of sense once I thought about it. With SCIT you stop and restart VM instances every minute (or whatever depending in your desired window vulnerability).... This short exposure window means worms and viri do not have long enough to fully infect a machine and carry out a coordinated attack. A machine is up for a while. Does work. And then is torn down again only to be reborn as a clean VM with no possibility of infection (unless of course the VM mechanisms become infected). It's like curing cancer by constantly moving your consciousness to new blemish free bodies. Hmmm... SCIT is really a genius approach to scalable (I have to work in scalability somewhere) security and and fits perfectly with cloud computing and swarm (cloud of clouds) computing. Clouds provide plenty of VMs so there is a constant ready supply of new hosts. From a software design perspective EC2 has been training us to expect failures and build Crash Only Software. We've gone stateless where we can so load balancing to a new VM is not problem. Where we can't go stateless we use work queues and clusters so again, reincarnating to new VMs is not a problem. So purposefully restarting VMs to starve zombie networks was born for cloud computing. If a wider move could be made to cloud backed thin clients the internet might be a safer place to live, play, and work. Imagine being free(er) from spam blasts and DDOS attacks. Oh what a wonderful world it would be...

    Click to read more ...


    Federation at Flickr: Doing Billions of Queries Per Day

    Flickr's lone database guy Dathan Pattishall made his excellent presentation available on how on how Flickr scales its backend to handle tremendous loads. Some of this information is available in Flickr Architecture, but the paper is so good it's worth another read. If you want to see sharding done right, at scale, take a look.

    Click to read more ...


    Five Ways to Stop Framework Fixation from Crashing Your Scaling Strategy

    If you've wondered why I haven't been posting lately it's because I've been on an amazing Beach's motorcycle tour of the Alps (and, and, and, and, and, and, and, and). My wife (Linda) and I rode two-up on a BMW 1200 GS through the alps in Germany, Austria, Switzerland, Italy, Slovenia, and Lichtenstein. The trip was more beautiful than I ever imagined. We rode challenging mountain pass after mountain pass, froze in the rain, baked in the heat, woke up on excellent Italian coffee, ate slice after slice of tasty apple strudel, drank dazzling local wines, smelled the fresh cut grass as the Swiss en masse cut hay for the winter feeding of their dairy cows, rode the amazing Munich train system, listened as cow bells tinkled like wind chimes throughout small valleys, drank water from a pure alpine spring on a blisteringly hot hike, watched local German folk dancers represent their regions, and had fun in the company of fellow riders. Magical. They say you'll ride more twists and turns on this trip than all the rest of your days riding put together. I almost believe that now. It wasn't uncommon at all to have 40 hairpin turns up one side of the pass and another 40 on the way down. And you could easily ride over 5 passes a day. Take a look at the above picture for one of the easier examples. Which leads me to the subject of this post. It's required by the Official Blogger Handbook after a vacation to conjure some deep insight tying the vacation experience to the topic of blog. I got nada. Really. As you might imagine motorcycling and scalability aren't deeply explicable of each other. Except perhaps for one idea that I pondered a bit while riding through hills that were alive with music: target fixation. Target fixation is the simple notion that the bike goes where you look. Focus on an obstacle and you'll hit the obstacle, even though you are trying to avoid it. The brain focuses so intently on an object that you end up colliding with it. So the number one rule of riding is: look where you want to go. Or in true self-help speak: focus on the solution instead of the problem. Here's a great YouTube video showing what can happen. And here's another... It may be hard to believe target fixation exists as a serious risk. But it's frustratingly true and it's a problem across all human endeavors. If you've ever driven a car and have managed to hit the one pot hole in the road that you couldn't take your eyes off--that's target fixation. Paragliders who want to avoid the lone tree in a large barren field can still mange to hit that tree because they become fixated on it. Fighter pilots would tragically concentrate on their gun sights so completely they would fly straight into the ground. Skiers who look at trees instead of the spaces in between slam into a cold piny embrace. Mountain bikers who focus on the one big rock will watch that rock as they tumble after. But target fixation isn't just about physical calamity. People can mentally stick to a plan that is failing because all they can see is the plan and they ignore the ground rushing up to meet them. This is where the framework fixation that we'll talk about a little later comes in. But for now pretend to be a motorcycle rider for a second. Imagine you are in one of those hairpin turns in the above picture. You are zooming along. You just masterfully passed a doubledecker tour bus and you are carrying a lot of speed into the turn. The corner gets closer and closer. Even closer. Stress levels jump. Corners are scary. Your brain suddenly jumps to a shiny thing off to the side of the road. The shiny thing is all you can see in your mind even though you know the corner looms and you must act. The shiny thing can be anything. In honor of Joey Chestnut's heroic defeat of Kobayashi at Nathan’s Famous Hot Dog Eating Competition, I inserted a giant hot dog as a possible distraction in the photo. But maybe it's a cow with a particularly fine bell. Or a really cool castle ruin. A picture perfect waterfall. Or maybe it's the fact that there's no guardrail and the fall is a 4000 feet drop and a really big truck is coming into your lane. Whatever the distraction, when you focus on that shiny thing you'll drive to it and fly off the corner. That's target fixation. Your brain will guide you to what you are focused on, not where you want to go. I've done it. Even really good riders do it. Maybe we've all done it. In true Ninja fashion we can turn target fixation to our advantage. On entering a turn pick a line, scrub off speed before beginning the turn, and turn your head to look up the road where you want to go. You will end up making a perfect turn with no conscious effort. Your body will automatically make all the adjustments needed to carry out the turn because you are looking where you want to go, which is the stretch of road after the turn. This even works in really tight obstacle courses where you need to literally turn on a dime. Now at first you don't believe this. You think you must consciously control your every movement at all times or the world fall into a chaotic mess. But that's not so. If you want to screw up someone's golf game ask them to explain their swing to you. Once they consciously start thinking about their swing they won't be able to do it anymore. This is because about half the 100 billion neurons in your brain are dedicated to learned unconscious motor movement. There's a lot of physical hardware in your brain dedicated to help you throw a rock to take down a deer for dinner. Once your clumsy conscious mind interferes all that hard won expertise looks like a 1960s AI experiment gone terribly wrong. Frameworks can also cause a sort of target fixation. As an example, let's say you are building a microblogging product and you pick a framework that makes creating an ORM based system easy, clean, and beautiful. This approach works fine for a while. Then you take off and grow at an enviable rate. But you are having a problem scaling to meet the new demand. So you keep working and reworking the ORM framework trying to get it to scale. It's not working. But the ORM tool is so shiny it's hard to consider another possibly more appropriate scaling architecture. You end up missing the corner and flying off the side of the road, wondering what the heck happened. That's the downside of framework fixation. You spend so much time trying frame your problem in terms of the framework that you lose sight of where you are trying to go. In the microblogging case the ORM framework is completely irrelevant to the microblogging product, yet most of the effort goes into making the ORM scale instead of stepping back and implementing an approach that will let you just turn your head and let all the other unconscious processes make the turn for you.

    Framework Fixation Solutions

    How can you avoid the framework fixation crash?
  • Realize framework fixation exists. Be mindful when hitting a tough problem that you may be focusing on a shiny distraction rather than solving a problem.
  • Focus on where you want to go. In whitewater river rafting they teach you not to point to the danger, but instead point to a safe route to avoid the danger. Let's say there's a big hole or a strainer you should know about. Your first reaction is to point to the danger. But that sets up a target fixation problem. You are more likely to hit what is being pointed to than avoid it. So you are taught to point to the safe route to take rather than dangerous route to avoid. This cuts down on a lot of possible mistakes. It's also a good strategy for frameworks. Have a framework in which you do the right thing naturally rather than use a framework in which you can succeed if you manage to navigate the dozens of hidden dangers. Don't be afraid to devote half your neurons to solving this problem.
  • Use your brain to pick the right target. It really sucks to pick a wrong target and crash anyway.
  • Keep your thinking processes simple. Information overload can lead to framework fixation. As situations become more and more complicated it becomes easier and easier to freeze up. Find a way to solve a problem and the right abstraction level.
  • Build up experience through practice. Looking away from a shiny thing is one of the most difficult things in the world to do. Until you experience it it's hard to believe how difficult it can be. Looking away take a lot of conscious effort. Looking away is a sort of muscle built through the experience of looking where you should be going. The more you practice the more you can control the dangerous impulse to look at shiny things. This problem exists at every level of development, it's not just limited to frameworks.

    Related Articles

  • Target Fixation for Paragliders by Joe Bosworth.
  • Driving Review: Target Fixation ... Something Worth Looking At! by Mick Farmer

    Click to read more ...

  • Saturday

    ID generation schemes

    Hi, Generating unique ids is a common requirements in many projects. Generally, this responsibility is given to Database layer. By using sequences or some other technique. This is a problem for horizontal scalability. What are the Guid generation schemes used in high scalable web sites generally? I have seen use java's SecureRandom class to generate Guid. What are the other methods generally used? Thanks Unmesh

    Click to read more ...


    Pyshards aspires to build sharding toolkit for Python

    I've been interested in sharding concepts since first hearing the term "shard" a few years back. My interest had been piqued earlier, the first time I read about Google's original approach to distributed search. It was described as a hashtable-like system in which independent physical machines play the role of the buckets. More recently, I needed the capacity and performance of a Sharded system, but did not find helpful libraries or toolkits which would assist with the configuration for my language of preference these days, which is Python. And, since I had a few weeks on my hands, I decided I would begin the work of creating these tools. The result of my initial work the Pyshards project, a still-incomplete python and MySQL based horizontal partitioning and sharding toolkit. readers will already know that horizontal partitioning is a data segmenting pattern in which distinct groups of physical row-based datasets are distributed across multiple partitions. When the partitions exist as independent databases and when they exist within a shared-nothing architecture they are known as shards. (Google apparently coined the term shard for such database partitions, and pyshards has adopted it.) The goal is to provide big opportunities for database scalability while maintaining good performance. Sharded datasets can be queried individually (one shard) or collectively (aggregate of all shards). In the spirit of The Zen of Python, Pyshards focuses on one obvious way to accomplish horizontal partitioning, and that is by using a hash/modulo based algorithm. Pyshards provides the ability to reasonably add polynomial capacity (number of original shards squared) without re-balancing (re-sharding). Pyshards is designed with re-sharding in mind (because the time will come when you must re-balance) and provides re-sharding algorithms and tools. Finally, Pyshards aspires to provide a web-based shard monitoring tool so that you can keep an eye on resource capacity. So why publish an incomplete open source project? I'd really prefer to work with others who are interested in this topic instead of working in a vacuum. If you are curious, or think you might want to get involved, come visit the project page, join a mailing list, or add a comment on the WIKI. Devin

    Click to read more ...


    Apple's iPhone to Use a Centralized Push Based Notification Architecture

    Update 2: Hank Williams says iPhone Background Processing: Not Fixed But Halfway There. Excellent analysis of all the reasons you need real background processing. Hey, you can't even build an alarm clock! Hard to believe some commenters say it's not so.. Update: Josh Lowensohn of Webware tells us Why users should be scared of Apple's new notification system. A big item on the iPhone developer iWishlist has been background processing. If you can't write an app to poll for new data in the background how will you keep your even more important non-foreground app in sync? Live from the Apple developer conference we learn the solution is a centralized push based architecture. Here's the relevant MacRumorsLive transcript:

  • Thanking the developers for their hard work. Now talking about how the #1 request has been background support. Apple wants to solve this problem.
  • The wrong solution would be to allow for background processes -- bad for battery life and performance. Poking fun at Windows Mobile's task manager.
  • Apple has come up with a far better solution -- a push notification service available for all developers.
  • When the user quits the application, Apple will push updates from their servers to the iPhone. The developer's servers push the notifications to Apple. These updates can include badges, sounds, and custom messages. This requires just one persistent connection and is extremely scalable.
  • Apple has come up with a far better solution -- a push notification service available for all developers. A poll based architecture is good when system capabilities are relatively symmetric. Clearly mobile phones are restricted along a number of dimensions, the most important being battery power. Having a large number of apps constantly polling for updates sucks down battery power faster than vampires at phlebotomist convention. So Apple's logic is sound. Keep a single connection over which data is pushed and work on the phone is minimized. You also maximize battery life and maximize bandwidth usage because data can be aggregated on the server side and be sent in large chunks rather than a random distribution of small packets. The mechanics of how this works isn't clear. Must all apps needing to push data to a phone become part of Apple's private iPhone cloud? Smart for Apple as it gives them complete control. For sculptors of the ultimate user experience you want total control. Not so good for developers as it's just another garden with a very high wall protecting it.

    Click to read more ...

  • Monday

    FaceStat's Rousing Tale of Scaling Woe and Wisdom Won

    Lukas Biewald shares a fascinating slam by slam recount of how his FaceStat (upload your picture and be judged by the masses) site was battered by a link on Yahoo's main page that caused an almost instantaneous 650,000 page view jump on their site. Yahoo spends considerable effort making sure its own properties can handle the truly massive flow from the main page. Turning the Great Eye of the Internet towards an unsuspecting newborn site must be quite the diaper ready experience. Theo Schlossnagle eerily prophesized about such events in The Implications of Punctuated Scalabilium for Website Architecture: massive, unexpected and sudden traffic spikes will become more common as a fickle internet seeks ever for new entertainments (my summary). Exactly FaceStat's situation. This is also one of our first exposures to an application written on Merb, a popular Ruby on Rails competitor. For those who think Ruby is the problem, their architecture now serves 100 times the original load. How did our fine FaceStat fellowship fair against Yahoo’s onslaught? Not a lot of details of FaceStat’s architecture are available, so it’s not that kind of post. What interested me is that it’s a timely example of Theo’s traffic spike phenomena and I was also taken with how well the team handled the challenge. Few could do better so quickly. In fact, let’s apply Theo’s rubric for how to handle these situations to FaceStat:
  • Be Alert: build automated systems to detect and pinpoint the cause of these issues quickly (in less than 60 seconds). None initially, but they are building in more monitoring to better handle future situations. Better monitoring would have alerted them to the problems long before they actually were alerted. Perhaps many more potential customers might have been converted to actual customers. You can never have enough monitoring!
  • Be Prepared: understand the bottlenecks of your service systemically. As the system was relatively simple, new, and quickly changed, my assumption is they were fully aware of their system’s shortcomings, they were just busy with adding features rather than worrying about performance and scalability.
  • Perform Triage: understand the importance of the various services that make up your site. Definitely. They “started ripping out every database intensive feature” in response to the load.
  • Be Calm: any action that is not analytically driven is a waste of time and energy. They stayed amazingly calm as can be seen from the following quote: “It’s one thing to code scalably and grow slowly under increasing load, but it’s been a blast to crazily rearchitect a live site like FaceStat in a day or two.” I’m not sure how analytically driven they were however  All-in-all an impressive response to the Great Eye’s undivided attention. But not everyone was impressed as I. A commenter named Bernard said: Sorry, but this is a really dumb story. Given how dirt cheap things like slicehost and linode are, it is crazy that you launched a web app and had not already prepared a redundant, highly-scalable architecture… I’d say you were damn lucky that the disappointed users came back at all. Commenter Will thought it was a “Nice problem to be having!” Which it is, of course, being noticed is better than being ignored. But Lukas was spot on when he lamented about being noticed too soon has a downside: After working so hard to get users to come to your site, it’s amazingly frustrating to see hundreds of thousands of people suddenly locked out. Clearly we still don’t have the ability for developers to create scalable systems as simply as they create exploratory systems. Ed from Rackspace posted that they could help with their Auto Scale of Arrays feature. And Rackspace would be an excellent solution, but the cost would be $500/month and a $2500 setup fee. No “let’s put on a show” startup can afford those costs. The mode FaceStat was in is typical: We find that a Rails-like platform is invaluable for rapidly prototyping a new site, especially since we started FaceStat as a pure experiment with no idea whether people would like it or not, and with a very different feature set in mind compared to what it later became. A pay as you grow model is essential for scalability because that’s the only way you can bake scalability in from the start. And even with all the impressive advances in the industry we still don’t have the software infrastructure to make scaling second nature.

    Information Sources

  • Scaling Fast by Lukas Biewald
  • FaceStat scales! on Dlores BLog


  • Merb. Ruby based MVC framework that is ORM-agnostic.
  • Thin. A fast and very simple Ruby web server.
  • Slicehost. Hosting service. Able to quickly provision servers as needed.
  • Amazon’s S3. Image server. Latency is high but it handles the load.
  • Capistrano. Automated deployment.
  • Git with github. Source code control system. Supports efficient simultaneous development, quick merging and deployment.
  • God. Server monitoring and management.
  • Memcached. Application caching layer.
  • PostgreSQL

    The Stats

  • Six app servers.
  • One big database machine.

    The Architecture

  • FaceStat is a write heavy application and performs involved calculations on data.
  • S3 is used to offload the responsibility for storing images. This freed them from the massive bandwidth requirements and complexity of managing their own images.
  • Memcached offloads reads from the database to allow the database to have more time for writes.

    Lessons Learned

  • Monitor the site. The sooner you know about a problem the faster it can be fixed. Don't rely on user email or email from exception handlers or you'll never get ahead of problems.
  • Communicate with your users with an error page. A meaningful error pages shows you care and that you are working on the problem. That's enough for a second chance with most people.
  • Use a cached statically generated homepage. Hard to beat that for performance.
  • Big sites might want to give a heads up when they mention smaller sites. Just a short polite email saying how your world will soon turn upside down would do.
  • High-level platform really doesn’t matter compared to overall architecture. How you handle writes, reads, caching, deployment, monitoring, etc are relatively framework independent and it's how you solve those problems that matter.
  • Ruby and Merb supported rapid prototyping to experiment and create a radically different system form the one they intended.

    Click to read more ...