Paper: Container-based Operating System Virtualization: A Scalable, High-performance Alternative to Hypervisors

One stumbling block of the the great march towards virtualization is the relatively poor performance of resource hungry applications like databases. We are told to develop and test using VMs, but deploy without them. Which kind of sucks IMHO. Maybe better virtualization technology can remove this split. This paper talks about a different approach to virtualization called "container-based" virtualization that can reportedly double the performance of traditional hypervisor systems like Xen. It does this by trading isolation for efficiency. Rather than maintaining complete isolation between VMs the container approach shares resources between VMs and thus gives higher performance while still guaranteeing strong fault, resource, and security isolation. It's yet another battle in computing's endless war of creating and destroying abstraction layers. I learned a lot from from this paper because of how it compared and contrasted traditional hypervisor and container based virtualization strategies. Good job.

Click to read more ...


ID generator

Hi, I would like feed back on a ID generator I just made. What positive and negative effects do you see with this. It's programmed in Java, but could just as easily be programmed in any other typical language. It's thread safe and does not use any synchronization. When testing it on my laptop, I was able to generate 10 million IDs within about 15 seconds, so it should be more than fast enough. Take a look at the attachment.. (had to rename it from to IdGen.txt to attach it)

Click to read more ...


scaling drupal - an open-source infrastructure for high-traffic drupal sites

the authors of drupal have paid considerable attention to performance and scalability. consequently even a default install running on modest hardware can easily handle the demands a small website. if you are lucky, eventually the time comes when you need to service more users than your system can handle. at some point, you'll start looking at your hardware and network deployment.

read more.

Click to read more ...


What CDN would you recommend?

Hi all, a I run a site that after a complete redesign have gotten a lot more traffic. The site provides free flash games, so the biggest traffic share goes to serving flash files (from about 100K and up to several megabytes in size each.) I currently host the entire site on a hosting provider that have no traffic limits. But since they are very cheap (yet have served me very well all the time with at least 99,9% uptime), I don't trust them in allowing me to continue consuming more and more bandwidth. I just guess I'm going to reach some internal limit they have on day, so I'm looking into moving all the flash content over to a content delivery network of some sort. Some recent traffic stats: August: 12 GB September: 22 GB October: 55 GB November: Currently 2,3 GB pr day on average, but it's rising.. I've been looking into Amazon S3, but have not decided on anything yet. So therefor I'm asking if there are any other provides I should consider, that operates within the same price range as Amazon does (or lower)? Best regards, Christian Felde

Click to read more ...


Product: ChironFS

If you are trying to create highly available file systems, especially across data centers, then ChironFS is one potential solution. It's relatively new, so there aren't lots of experience reports, but it looks worth considering. What is ChironFS and how does it work? Adapted from the ChironFS website: The Chiron Filesystem is a Fuse based filesystem that frees you from single points of failure. It's main purpose is to guarantee filesystem availability using replication. But it isn't a RAID implementation. RAID replicates DEVICES not FILESYSTEMS. Why not just use RAID over some network block device? Because it is a block device and if one server mounts that device in RW mode, no other server will be able to mount it in RW mode. Any real network may have many servers and offer a variety of services. Keeping everything running can become a real nightmare!

Click to read more ...


Quick question about efficiently implementing Facebook 'news feed' like functionality

Im sure most are familiar with Facebooks 'news feed'. If not, the 'news feed' basically lists recent activity of all of your friends. I dont see how you can get this information efficiently from a DB: * Im assuming all user activity is inserted in a "actions" table. * first get a list of all your friends * then query the actions table to return recent activity where the activity belongs to someone on your friends list This can't be efficient especially considering some people have 200+ friends. So what am I missing? How do you think Facebook is implementing their "news feed". Im not asking for any specific details, just a general point in the right direction, as I cant see how they are implementing the 'news feed efficiently. Thanks.

Click to read more ...


Strategy: Diagonal Scaling - Don't Forget to Scale Out AND Up

All the cool kids advocate scaling out as the secret sauce of scaling. And it is, but don't forget to serve some tasty "scaling up" as a side dish. Scaling up doesn't have to mean buying a jet propelled, liquid cooled, 128 core monster super computer. Scaling up can just mean buying at the high end of the commodity buffet by buying more cores, more memory and using a shared nothing architecture to take advantage of all that power without adding complexity. Scale out when you need to, but big beefy boxes can absorb a lot of load before it's necessary to hit up your data center for more rack space. Here are a few examples of scaling out and up:

  • John Allspaw, Flickr's operations manager, coined the term diagonal scaling for this strategy. In Making a site faster by removing machines (and a comment on this post) John told how Flickr replaced 67 dual-cpu boxes with 18 dual quad-core machines and recovered almost 4x rack space and reduced costs by about 50 percent.
  • Fotolog's strategy is to scale up and out. By adding more cache, more RAM, more CPUs, and more efficient CPUs they were able to handle many millions more users with the same number of machines. This was a conscious choice on their part and it worked beautifully.
  • Wikimedia says scaling out doesn't require using cheap hardware. Wikipedia's database servers these days are 16GB dual or quad core boxes with 6 15,000 RPM SCSI drives in a RAID 0 setup.
  • Kevin Burton in his Distributed Computing Fallacy #9 post also says scaling out doesn't mean cheap:
    We’re seeing machines with eight cores and 32G of memory. If we were to buy eight disks for these boxes it’s really like buying 8 machines with 4G each and one disk. This partially goes into the horizontal vs vertical scale discussion. Is it better to buy one $10k box or 10 $1k boxes? I think it’s neither. Buy 4 $2.5k boxes. The new multicore stuff is super cheap.
  • Jeremy Cole in Scaling out AND up, a compromise asks for compromise:
    Scaling out doesn’t mean using crappy hardware. I think people take the “scale out” model (that they’ve often only read about from outdated conference presentations) to quite an extreme. They think scaling out means using desktop-class, bad hardware, and just buying a ton of them. That model doesn’t work, and it’s hell to maintain in the long term. Use commodity hardware. You often hear the term “commodity hardware” in reference to scale out. While crappy hardware is also commodity, what this means is that instead of getting stuck on the low-end $40k machine, with thoughts of upgrading to the $250k machine, and maybe later the $1M machine, you use data partitioning and any number of let’s say $5k machines. That doesn’t mean a $1k single-disk crappy machine as said above. What does it mean for the machine to be “commodity”? It means that the components are standardized, common, and the price is set by the market, not by a single corporation. Use commodity machines configured with a good balance of price vs. performance.

    Click to read more ...

  • Friday

    How Tracks 300 Servers Handling 10 Million Pageviews hosts 300 servers in 5 different data centers. It's always useful to learn how large installations manage all their unruly children: Currently we Nagios for server health monitoring, Munin for graphing various server metrics, and a wiki to keep track of all the server hardware specs, IPs, vendor IDs, etc. All of these tools have suited us well up until now, but there have been some scaling issues. The post covers how these different tools are working for them and the comment section has some interesting discussions too.

    Click to read more ...


    Paper: Dynamo: Amazon’s Highly Available Key-value Store

    Update 2: Read/WriteWeb has a good article talking about the scalability issues of relational databases and how Dynamo solves them: Amazon Dynamo: The Next Generation Of Virtual Distributed Storage. But since Dynamo is just another frustrating walled garden protected by barbed wire and guard dogs, its relevance is somewhat overstated. Update: Greg Linden has a take on the paper where he questions some of Amazon's design choices: emphasizing write availability over fast reads, a lack of indexing support, use of random distribution for load balancing, and punting on some scalability issues. Werner Vogels, Amazon's avuncular CTO, just announced a new paper on the internal database technology Amazon uses to handle tens of millions customers. I'll dive into more details later, but I thought you'd want to read it hot off the blog. The bad news is it won't be a service. They are keeping this tech not so secret, but very safe. Happily, it's another real-life example to learn from. As many top websites use a highly tuned key-value database at their core instead of a RDBMS, it's an important technology to understand. From the abstract you can get a feel for what the paper is about:

    Reliability at massive scale is one of the biggest challenges we face at, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems. This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
    My first impressions after reading the paper:
  • Wow. But crap, I'll never be able to build anything like that. This is really competition through better infrastructure. Take that Google :-)
  • Their purposeful embracing of probability and manged centers of uncertainty must be dizzying for those from a RDBMS background. In a RDBMS it's all right angles. You write something and it's assumed consistent, correct, and durable. Now, how do you do this at scale across multiple data centers under failure conditions? There's the rub. So Amazon says writes must go through and we will deal with the complexities that model generates. They version objects and merge them later. Who does that? I love it, because when delve into these problems you realize you need this type of functionality, but it's too complex, so you back away and continue trying to force a square peg in a round whole. To have no fear to go where your requirements leads you is real engineering.
  • Can you imagine finding a problem in that system? I'd love to be a fly in those debugging sessions. But infrastructure takes on self-consciousness of its own when dealing with complex problems, so you just have to deal with knowing you don't know anymore. A lot of this thinking is driven by the CAP conjecture which states it's impossible for a web service to simultaneously guarantee consistency, availability, and partition-tolerance. When you get over your initial "that can't be true" reaction and embrace it, you get something like Dynamo. I'd really love to hear what you guys think about Dynamo.

    Click to read more ...

  • Tuesday

    Feedblendr Architecture - Using EC2 to Scale

    A man had a dream. His dream was to blend a bunch of RSS/Atom/RDF feeds into a single feed. The man is Beau Lebens of Feedville and like most dreamers he was a little short on coin. So he took refuge in the home of a cheap hosting provider and Beau realized his dream, creating FEEDblendr. But FEEDblendr chewed up so much CPU creating blended feeds that the cheap hosting provider ordered Beau to find another home. Where was Beau to go? He eventually found a new home in the virtual machine room of Amazon's EC2. This is the story of how Beau was finally able to create his one feeds safe within the cradle of affordable CPU cycles. Site:

    The Platform

  • EC2 (Fedora Core 6 Lite distro)
  • S3
  • Apache
  • PHP
  • MySQL
  • DynDNS (for round robin DNS)

    The Stats

  • Beau is a developer with some sysadmin skills, not a web server admin, so a lot of learning was involved in creating FEEDblendr.
  • FEEDblendr uses 2 EC2 instances. The same Amazon Instance (AMI) is used for both instances.
  • Over 10,000 blends have been created, containing over 45,000 source feeds.
  • Approx 30 blends created per day. Processors on the 2 instances are actually pegged pretty high (load averages at ~ 10 - 20 most of the time).

    The Architecture

  • Round robin DNS is used to load balance between instances. -The DNS is updated by hand as an instance is validited to work correctly before the DNS is updated. -Instances seem to be more stable now than they were in the past, but you must still assume they can be lost at any time and no data will be persisted between reboots.
  • The database is still hosted on an external service because EC2 does not have a decent persistent storage system.
  • The AMI is kept as minimal as possible. It is a clean instance with some auto-deployment code to load the application off of S3. This means you don't have to create new instances for every software release.
  • The deployment process is: - Software is developed on a laptop and stored in subversion. - A makefile is used to get a revision, fix permissions etc, package and push to S3. - When the AMI launches it runs a script to grab the software package from S3. - The package is unpacked and a specific script inside is executed to continue the installation process. - Configuration files for Apache, PHP, etc are updated. - Server-specific permissions, symlinks etc are fixed up. - Apache is restarted and email is sent with the IP of that machine. Then the DNS is updated by hand with the new IP address.
  • Feeds are intelligently cached independely on each instance. This is to reduce the costly polling for feeds as much as possible. S3 was tried as a common feed cache for both instances, but it was too slow. Perhaps feeds could be written to each instance so they would be cached on each machine?

    Lesson Learned

  • A low budget startup can effectively bootstrap using EC2 and S3.
  • For the budget conscious the free ZoneEdit service might work just as well as the $50/year DynDNS service (which works fine).
  • Round robin load balancing is slow and unreliable. Even with a short TTL for the DNS some systems hold on to the IP addressed for a long time, so new machines are not load balanced to.
  • Many problems exist with RSS implementations that keep feeds from being effectively blended. A lot of CPU is spent reading and blending feeds unecessarily because there's no reliable cross implementation way to tell when a feed has really changed or not.
  • It's really a big mindset change to consider that your instances can go away at any time. You have to change your architecture and design to live with this fact. But once you internalize this model, most problems can be solved.
  • EC2's poor load balancing and persistence capabilities make development and deployment a lot harder than it should be.
  • Use the AMI's ability to be passed a parameter to select which configuration to load from S3. This allows you to test different configurations without moving/deleting the current active one.
  • Create an automated test system to validate an instance as it boots. Then automatically update the DNS if the tests pass. This makes it easy create new instances and takes the slow human out of the loop.
  • Always load software from S3. The last thing you want happening is your instance loading, and for some reason not being able to contact your SVN server, and thus failing to load properly. Putting it in S3 virtually eliminates the chances of this occurring, because it's on the same network.

    Related Articles

  • What is a 'River of News' style aggregator?
  • Build an Infinitely Scalable Infrastructure for $100 Using Amazon Services

    Click to read more ...