A Scalability Lament

In Scalability issues for dummies Alex Barrera talks movingly about the challenges he faces trying to scale his startup inkzee as the lone developer. Inkzee is an online news reader that automatically groups similar topics. This is a cool problem and is one you know right away is going to have some killer scalability problems as the number of feeds and the number of users increase. And these problems lead to the point of the post, to explain here what are scalability problems and how deep the repercussions are for a small company, which Alex does admirably.

Some takeaways:

  • Sites are composed of a frontend and backend. The backend isn't visible to users, but it does all the work and this is where the scalability problems show up.
  • As more and more users use a site it becomes slow because more users reveal bottlenecks in the system that weren't visible before.
  • There can be many many reasons for these bottlenecks. They are often very hard to find because the backend systems are very complex and have a lot of complex interactions.
  • It takes a while to find the bottlenecks and create fixes for them. You are never quite sure if the fixes will really work. Many of these problems are unique to the problem space so pre-canned solutions aren't always available. And because you don't want to destroy your production servers it takes a while to put fixes into the system. This means your release cycle is slow which means progress on your site is slow.
  • The process of redesign sucks up all your resources so progress on the site stalls almost completely, especially for a small development group. This stops growth as you can't handle new customers and your existing customers become disenchanted.
  • The process is unfortunately iterative. Solving one problem just puts you in the queue for the next problem. There's a reason it took Twitter a while to get on their feet.
  • Scalability problems aren’t something you can discard as being ONLY technical, it’s roots might be technical but its effects will shake the whole company.

    I found Alex's commentary quite touching and familiar. As I imagine many of you do too. It's the modern equivalent of an explorer following a dream. Going alone into uncharted territory where the Dragons live and trying to survive when everything seems against you. For every great returning hero there are 10 who do not make it back. And that's hard to deal with.

    Everyone will certainly have their ideas on how to "fix" the problem, as that's what engineers do. But it also doesn't hurt to use our Venus brain for a moment and simply recognize the toll this process can take. It can be dispiriting. The continual stream of problems and lack of positive feedback can wear you down after a while. To stick with it takes a bit of craziness in the heart.

    Switching back to being Mars brained I might suggest:
  • Find a partner. There's always a Jerry to go along with Ben. A Martin to go along with Lewis. And Rambo is just a movie. Going it alone is hard hard hard. Partners can pick each other up and give each other a breather when necessary.
  • Use the Cloud. Whatever your opinions of the economics of the cloud, it makes testing of the type Alex was needing a breeze. Architect for the cloud and you can spin up a system in parallel, run a load generator, and never touch your production system at all. And do it all for cheap. This is one of the biggest wins for the cloud and would seem to solve a lot of problems.

    Related Articles

  • Scalable Web Architectures and Application State by Marton Trencseni
  • Scalability for dummies by Royans Tharakan
  • Friday

    Against all the odds

    This article not about Mariah Carey, or its song. It's about Storing System, Database.

    First let's describe what means by odds: In my social network, I found 93% of the mainstream developers sanctify the database, or at least consider it in any data persistence challenge as the ultimate, superhero, and undefeatable solution.

    I think this problem come from the education, personally, and some companies also I think it's involved in this.

    To start to fix this bad thinking, we all should agree in the following points:

    • Every challenge have its own solutions, so whatever you want to save/persistent, there are always many solutions. For example the Web search engines, such as: Google, Kngine, Yahoo, Bing don't use database at all instead we use Indexes (Index file) for better performance.
    • The Database in general whatever the vendor it's slow compared with other solutions such as: Key-Value storing system, Index file, DHT.
    • The Database currently employ Relation Data model, or Object relational data model, so don't convince yourself to save non-relation data into relation data model store system such as: Database.
    • The Database system architecture didn't changed very much in last 30 years, and it's content a lot of limits, and fails in its performance, scalability character. If you don't believe me check out this papers:
    1. The End of an Architectural Era (It's Time for a Complete Rewrite)

    2. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

    I hope if you agreed with me in the previous points. So the question do we really need Database in every application?

    There are many scenario shouldn't use Database resisters, such as: Web search engine, Caching, File sharing system, DNS system, etc. In the other hand there many of scenarios should use Database, such as: Customer database, Address book, ERP, etc.

    Tiny URL services for example, shouldn't use Database at all because it's require very simple needs, just map a small/tiny URL to the real/big URL. If you start agreed with me, you likely want ask: But what we can use beside or instead of Databases?

    There are a lot of tools that fallowing CAP, BASE model, instead of ACID model. But first let's describe ACID:

    • Atomicity: A transaction is all or nothing
    • Consistency: Only valid data is written to the database
    • Isolation: Pretend all transactions are happening serially and the data is correct
    • Durability: what you write is what you get
    1. The problem with ACID is that it gives you too much; it trips you up when you are trying to scale a system across multiple nodes.

    2. Down time is unacceptable. So your system needs to be reliable. Reliability requires multiple nodes to handle machine failures.

    3. To make scalable systems that can handle lots and lots of reads and writes you need many more nodes.

    4. Once you try to scale ACID across many machines you hit problems with network failures and delays. The algorithms don't work in a distributed environment at any acceptable speed.

    In other hand CAP model is about:

    • Consistency: Your data is correct all the time. What you write is what you read.

    • Availability: You can read and write and write your data all the time.

    • Partition Tolerance: If one or more nodes fails the system still works and becomes consistent when the system comes on-line.
    1. CAP is easy to scale, distribute. CAP is scalable by nature.

    2. Everyone who builds big applications builds them on CAP. Who use CAP: Google, Yahoo, Facebook, Kngine, Amazon, eBay, etc.

    For example in any in-memory or in-disk caching system you will never need all the Database features. You just need CAP like system. Today there are a lot of: column oriented, and key-value oriented systems. But first let's describe Column oriented:

    A column-oriented is a database management system (DBMS) which stores its content by column rather than by row. This has advantages for databases such as data warehouses and library catalogues, where aggregates are computed over large numbers of similar data items. This approach is contrasted with row-oriented databases and with correlation databases, which use a value-based storage structure. For more information check Wikipedia page.

    Distributed key-value stores:

    Distributed column stores (Bigtable-like systems):

    Something a little different:



    Scaling Traffic: People Pod Pool of On Demand Self Driving Robotic Cars who Automatically Refuel from Cheap Solar

    Update 17: Are Wireless Road Trains the Cure for Traffic Congestion? BY ADDY DUGDALE. The concept of road trains--up to eight vehicles zooming down the road together--has long been considered a faster, safer, and greener way of traveling long distances by car

    Update 16: The first electric vehicle in the country powered completely by ultracapacitors. The minibus can be fully recharged in fifteen minutes, unlike battery vehicles, which typically takes hours to recharge.

    Update 15: How to Make UAVs Fully Autonomous. The Sense-and-Avoid system uses a four-megapixel camera on a pan tilt to detect obstacles from the ground. It puts red boxes around planes and birds, and blue boxes around movement that it determines is not an obstacle (e.g., dust on the lens).
    Update 14: ATNMBL is a concept vehicle for 2040 that represents the end of driving and an alternative approach to car design. Upon entering ATNMBL, you are presented with a simple question: "Where can I take you?" There is no steering wheel, brake pedal or driver's seat. ATNMBL drives for you. Electric powered plus solar assist, with wrap-around seating for seven, ATNMBL offers living and/or working comfort, views, conversations, entertainment, and social connectedness.

    Update 13: The Next Node on the Net? Your Car!. A new radio system developed in Australia is transforming the vehicles on the street into nodes on a network.
    Update 12: United Arab Emirates building network of driverless electric taxis. When the system's fully built, planners say the podcars will be able to deliver riders within 100 meters of any location in the city. The whole network of tracks for the cars will be two stories beneath street level.

    Update 11: Self-driving cars set to cut fuel consumption. Large-scale test seeks to put humans in the back seat. NEDO says it will start testing several key technologies that allow for autonomous driving between 2010 and 2012.
    Update 10: Fighting Traffic Jams With Data. Researchers from different universities are working on ways for cars to better communicate with each other and relay crucial driver information such as traffic speed, weather and road conditions.
    Update 9: Accident Ahead? New Software Will Enable Cars To Make Coordinated Avoidance Maneuvers. In dangerous situations, the cars can independently perform coordinated maneuvers without their drivers having to intervene. In this way, they can quickly and safely avoid one another.
    Update 8: Great article in Wired on Better Place's proposal for a new electric car distribution system. The idea is to blanket the country with "smart" charge spots. You buy your car from them and purchase a recharge plan. Profit come from selling electricity.
    Update 7: Capturing solar energy from asphalt pavements. An interesting way to make the system self-sufficient.
    Update 6: Why We Drive the Way We Do Unlocks How to Unclog Traffic. Vanderbilt says: The fundamental problem is that you've got drivers who make user-optimal rather than system-optimal decisions. Josh McHugh replies: Make the packets (cars) dumb and able to take marching orders from traffic routing nodes.
    Update 5: Traffic jams are not caused by flaws in road design but by flaws in human nature. Nearly 80 percent of crashes involve drivers not paying attention for up to three seconds. The both good and scary thing about computers is they always pay attention.
    Update 4: Volvo Says It Will Have An Injury Proof Car By 2020.
    Update 3: Map Reading For Dummies. Europe (again) is developing a system that will read satellite navigation maps and warn the driver of upcoming hazards – sharp bends, dips and accident black spots – which may be invisible to the driver. Even better, the system can update the geographic database. Another key capability of the People Pod system.
    Update 2: Road Safety: The Uncrashable Car?. A European research project basic could lead to a car that is virtually uncrashable. An uncrashable car would definitely ease people's concerns over computerized navigation.
    Update: Shockwave traffic jam recreated for first time - "Pinpointing the causes of shockwave jams is an exercise in psychology more than anything else. 'If they had set up an experiment with robots driving in a perfect circle, flow breakdown would not have occurred. Human error is needed to cause the fluctuations in behaviour.'"

    Traffic in the San Francisco Bay area is like Dolly Parton, 10 pounds in a 5 pound sack. Mass transit has been our unseen traffic woe savior for a while. But the ring of political fire circling the bay has prevented any meaningful region wide transportation solution. As everyone scrambles to live anywhere they can afford, we really need a region wide solution rather than the local fixes that can never go quite far enough. The solution: create a People Pod Pool of On Demand Self Driving Robotic Cars who Automatically Refuel from Cheap Solar.

    Commuters are Satisfied Not Carpooling

    You might think we would car pool more. But people of the bay don't like carpools and they don't much like mass transit either. In the Metro, a local weekly, they published a wonderful article Fueling the Fire, on how we need to cure our car addiction using the same marginalization techniques used to "stop" smoking.

    A telling quote shows how difficult going cold turkey off our cars will be:

    Mitch Baer, a public policy and environment graduate student at George Mason University in Virginia, recently surveyed more than 2,000 commuters in the Washington, D.C., area. He found that people who drove to work alone were more emotionally satisfied with their commute than those who rode public transportation or carpooled with others.

    Even stuck in traffic jams, those commuters said they felt they had more control over their arrival and departure times as well as commuting route, radio stations and air conditioning levels.

    Commuters said that driving alone was both quicker and more affordable, according to the study.

    "They will have a tougher time moving people out of their cars," Baer said. "It's easier for most people to drive than take mass transit."

    The key phrase to me is: people who drove to work alone were more emotionally satisfied. How can people jostled in the great pinball machine that are our roadways be emotionally satisfied? That's crazy talk. Shouldn't we feel less satisfied?

    In Our Cars We Feel Good Because We Are in Control

    Solving the mystery of why we feel satisfied while stuck in traffic turns on an important psychological clue: the more we perceive ourselves in control of a situation the less stress we feel. Robert Sapolsky talks about this surprising insight into human nature in Why Zebras Don't Get Ulcers.

    Notice we simply need more "perceived" control. Take control of a situation in your mind and stress goes down. You don't actually need to be in more control of a situation to feel less stress. If you have diabetes, facing your possibly bleak future can be less stressful if you try to control your blood sugars. If you are a speed demon, buying a radar detector can make you feel more in control and less stressed as you zoom along the seldom empty highways. If you are bullied, figuring out ways to avoid your torturer puts you more in control and therefor less stressed.

    Figure out a way to control and an out of control situation and you'll feel happier. That's what I think we are accomplishing by driving alone in cars. In our car we have complete control. Cars are our castles with a 2 inch air moat cushion. Most cars are plusher than any room in your average house. Fine leather, a rad sound system, perfect temperature control, and a nice beverage of choice within easy reaching distance. In our cars we've created a second womb. The result is we feel more control, less stress, and more satisfaction, even when outside, across the moat, a tempestuous sea of stressors await.

    Our Mass Transit System Must Supply Perceived Control

    Given the warm inner glow we feel from being wrapped in the cold steel of our cars, if you want people to get out of their cars and onto mass transit you must provide the same level of perceived control. None of our mass transit options do that now. Buses are on fixed schedules that don't go where I want to go when I want to go. Neither do trains, BART, or light rail. So the car it is. Unless a system could be devised that provided the benefits of mass transit plus the pleasing characteristics of control our cars give us.

    With Recent Technological Advances We Can Create a New Type of Mass Transit System

    New technologies are being developed the will allow us to create a mass transit system that matches our psychological and physical needs. Just berating people and telling them they should take mass transit to save the planet won't work. The pain is too near and the benefits are too far for the mental cost-benefit calculation to go the way of mass transit.

    The technologies I am talking about are:

  • Inexpensive solar with $1/watt solar panels. Our mass transit must of course be green and cost effective.
  • Breakthrough battery could boost electric cars. Toshiba promises 'energy solution' with nearly full recharge in 5 minutes.
  • Personal transportation pods. A reusable vehicle that can take anyone anywhere they want to go.
  • Self driving vehicles. We are making great strides in creating robot cars that can drive themselves in traffic. Already they drive better than most humans can drive (low bar, I know).

    Mix these all together and you get a completely different type of mass transit system. A mashup, if you will.

    Create a People Pod Pool of On Demand Autonomous Self Driving Robotic Cars that Automatically Refuel from Cheap Solar

    Many company campuses offer a pool of bicycles so workers can ride between buildings and make short trips. Some cities even make bikes available to their citizens. The idea is to do the same for cars, but with a twist or two.

    The cars (people pods) can be stored close to demand points and you can call for one anytime you wish. The cars are self driving. You don't actually drive them and are free to work or play during transit. Different kinds would be available depending on your purpose. Just one person on a shopping trip would receive a different car than a family. The pods would autonomously search out and find energy sources as needed to recharge.There's no reason to assume a centralized charging and storage facility. When repair was needed they could drive themselves to a repair depot or wait for the people pod ambulance service.

    The advantages of such a system are:
  • Perceived control. You have a personal "car" you control the destination for, the interior environment of, and your own actions inside. This gets over the biggest hurdle with current mass transit options.
  • Better regional traffic flow. The autonomous cars could drive cooperatively to smooth out traffic jams. Traffic jams are largely caused by people speeding up and slowing down which causes ripples of slowness up and down the road. And automated system could prevent that.
  • Go where you want to go. It would be used because people can go to exactly where they need to go and be picked up exactly where they need to leave from at exactly the time they wish. None of these are characteristic of current systems.
  • Leverage existing road ways. Creating light rail and trains is expensive and wasteful (except for the high speed point-point variety). They don't extend to where people live and they don't go where people go. So it creates a multi-hop mess out of every trip. We already have an expansive road system that goes where everyone wants to go. Using the road infrastructure more efficiently makes a lot more sense than creating hugely expensive partial solutions. And since these cars would be eco-friendly, most arguments against using cars fall away.
  • Cheaper delivery. One force keeping truly distributed manufacturing and retailing from blossoming is high delivery costs. A $2 item is simply too expensive to buy remotely and ship because shipping costs more than the product. An automated transportation system would make this model more affordable.
  • Live where you want to live. Most mass transit systems are based on trying to socially reengineer our current suburbian and exurbian living pattern into a high density live-work pattern. While this should be an option, most mass transit proposals assume this pattern as a given and can't deal with current realities. For the foreseeable future people will not give up their houses or their lifestyles. The People Pod approach solves the mass transit problem and the "difficulties" of having to change a whole populace to behave in a completely different way for less than compelling reasons.
  • Still can own your own car. This isn't a replacement for the current car culture. It's leveraging the car culture. You can still own and drive your own car. Nobody is trying to steal your car away from you.
  • Cleaner and safer. Mass transit is disliked by many because it is perceived as dirty and unsafe. The pods would be safe and clean.
  • Road safety. Our new robot overloads will make our lives safer. Hopefully, possibly, maybe...


  • Current transportation budgets. Money could be redeployed from existing less than successful approaches.
  • Advertising. The outside of vehicles could contain advertising as could the inside, especially from the internal search system. Imagine wanting a new place to eat and asking the pod to suggest one. That's prime targeted marketing. Social networks and massive multi-player games could also be created between pods.
  • In-flight services. Movies on demand and so on.
  • Efficiencies. The plug-in cars are electric and efficient and low maintenance. That will save money.
  • Up sells. Individuals could buy their own pods and trick them out. Also, people could pay for a higher class of pod from the pod pool.

  • Licensing. Technology used in making the pods could be sold to other manufacturers. Create a standardized market so competition and cooperation can erupt.
  • Sponsorship. Companies could buy rights to play music, stock the food locker, use their equipment, etc.
  • Naming rights. The rights to name parts of the system could be sold.


  • Challenge prize. Maybe someone with a vision and a dream can put up a $50 million prize to get it going. Something like the Xprize.
  • Government funding. Don't laugh, it might happen.
  • Startup. I'm available if interested :-) With a large enough challenge prize this is a viable model.

    It's a Usable System so People Would Use It

    After a lot of reading on the topic and a lot of self-examination on why I am such a horrible person that I don't use mass transit more, this is the type of system I could really see myself using. It doesn't try to change the world, it uses what we got, and gives people what they want. It just might work.
  • Thursday

    Scalable Web Architectures and Application State

    In this article we follow a hypothetical programmer, Damian, on his quest to make his web application scalable.

    Read the full article on Bytepawn


    SPHiveDB: A mixture of the Key/Value Store and the Relational Database.

    The Key/Value Store becames more and more popular. When we use the Key/Value Store to store objects, we need to serialize/deserialize the objects as binary buffer. We have many ways to serialize/deserialize objects. A possible way is to use the Relational Database. Every value we store in the Key/Value Store is a SQLite instance. We can use the power of the Relational Database to manipulate the value. The SQL is very powerful for processing query request.

    SPHiveDB = TokyoCabinet + SQLite

    SPHiveDB is a server for sqlite database. It use JSON-RPC over HTTP to expose a network interface to use SQLite database. It supports combining multiple SQLite databases into one file ( through tokyo cabinet ). It also supports the use of multiple files.


    No to SQL? Anti-database movement gains steam – My Take

    In this post i wrote my view on the anti SQL database movement and where the alternative approach fits in:

    - SQL databases are not going away anytime soon.
    - The current "one size fit it all" databases thinking was and is wrong.
    - There is definitely a place for a more a more specialized data management solutions alongside traditional SQL databases.

    In addition to the options that was mentioned on the original article i pointed out the the in-memory alternative approach and how that fits into the puzzle. I used a real life scenario: scalable Social network based eCommerce site where i outlined how in-memory approach was the only option they could scale and meet their application performance and response time requirements.


    Servers Component - How to choice and build perfect server

    There are a lot of questions about how the server components, and how to build perfect server with consider the power consumption. Today I will discuss the Server components, and how we can choice better server components with consider the power consumption, efficacy, performance, and price.

    Key points:

    • What kind of components the servers needs?
    • The Green Computing and the Servers components
    • How much power the server consume
    • Choice the right components:
      • Processor
      • Hard Disk Drive
      • Memory
      • Operating system
    • Build Server, or buy?

    Art of Parallelism presentation

    This presentation about parallel computing, and it’s discover the following topic:

    • What is parallelism?

    • Why now?

    • How it’s works?

    • What is the current options

    • Parallel Runtime Library. (for more information go there)

    Note: All of my presentation is open source, so feel free to copy it, use it, and re-distribute it.


    It Must be Crap on Relational Dabases Week 

    It's hard to be a relational database lately. After years of faithful service everywhere you look the world is turning against you:

  • Recently at the NoSQL conference 150 revolutionaries met with their new anti-RDBMS arms suppliers. And you know what happens when revolutionaries are motivated, educated, funded, and well armed.
  • The revolution has gone mainstream when Computerworld writes No to SQL? Anti-database movement gains steam. It's not just whispers anymore, it's everywhere.
  • And perennial revolutionary Michael Stonebraker runs from blog to blog shouting the The End of a DBMS Era (Might be Upon Us). Relational vendors are selling legacy software, are 50x slower than other alternatives, and that can not stand.
  • The Greek Chorus on Hacker News sings of anger and lies.

    Certainly some say stick with the past. It's your fault, you aren't doing it right, give us another chance and all will be as it ever was. Some smirk saying this is nothing but a return to a more ancient time when IBM was King.

    But it's in the air. It's in the code. A revolution is coming. To what? That is what is not yet clear.

    See also:
  • NoSQL? by Curt Monash.
  • CouchDB says BigTable clones are too complex.
  • Yahoo! Developer Network Blog. Very nice summary of the different talks.
  • Thursday

    Product: Hbase

    Update 3: Presentation from the NoSQL Conference: slides, video.
    Update 2: Jim Wilson helps with the Understanding HBase and BigTable by explaining them from a "conceptual standpoint."
    Update: InfoQ interview: HBase Leads Discuss Hadoop, BigTable and Distributed Databases. "MapReduce (both Google's and Hadoop's) is ideal for processing huge amounts of data with sizes that would not fit in a traditional database. Neither is appropriate for transaction/single request processing."

    Hbase is the open source answer to BigTable, Google's highly scalable distributed database. It is built on top of Hadoop (product), which implements functionality similar to Google's GFS and Map/Reduce systems. 

    Both Google's GFS and Hadoop's HDFS provide a mechanism to reliably store large amounts of data. However, there is not really a mechanism for organizing the data and accessing only the parts that are of interest to a particular application.

    Bigtable (and Hbase) provide a means for organizing and efficiently accessing these large data sets.

    Hbase is still not ready for production, but it's a glimpse into the power that will soon be available to your average website builder.

    Google is of course still way ahead of the game. They have huge core competencies in data center roll out and they will continually improve their stack.

    It will be interesting to see how these sorts of tools along with Software as a Service can be leveraged to create the next generation of systems.