Entries in Data Center (5)


Using SSD as a Foundation for New Generations of Flash Databases - Nati Shalom

“You just can't have it all” is a phrase that most of us are accustomed to hearing and that many still believe to be true when discussing the speed, scale and cost of processing data. To reach high speed data processing, it is necessary to utilize more memory resources which increases cost. This occurs because price increases as memory, on average, tends to be more expensive than commodity disk drive. The idea of data systems being unable to reliably provide you with both memory and fast access—not to mention at the right cost—has long been debated, though the idea of such limitations was cemented by computer scientist, Eric Brewer, who introduced us to the CAP theorem.

The CAP Theorem and Limitations for Distributed Computer Systems

Click to read more ...


Who Has the Most Web Servers?

An interesting post on DataCenterKnowledge!

  • 1&1 Internet: 55,000 servers
  • Rackspace: 50,038 servers
  • The Planet: 48,500 servers
  • Akamai Technologies: 48,000 servers
  • OVH: 40,000 servers
  • SBC Communications: 29,193 servers
  • Verizon: 25,788 servers
  • Time Warner Cable: 24,817 servers
  • SoftLayer: 21,000 servers
  • AT&T: 20,268 servers
  • iWeb: 10,000 servers
  • How about Google, Microsoft, Amazon, eBay, Yahoo, GoDaddy, Facebook? Check out the post on DataCenterKnowledge and of course here on!



RAD Lab is Creating a Datacenter Operating System

The RAD Lab (Reliable Adaptive Distributed Systems Laboratory) wants to leapfrog the Big Switch and create The Next Big Switch, skipping the cloud/utility evolutionary stage altogether. This hyper-evolutionary niche buster develops technology so advanced the cloud disperses and you can go back to building your own personal datacenters again. Where Google took years to create their datacenters, using a prefab Datacenter Operating System you might create your own in a long holiday weekend. Not St. Patrick's of course. Their vision: Enable one person to invent and run the next revolutionary IT service, operationally expressing a new business idea as a multi-million-user service over the course of a long weekend. By doing so we hope to enable an Internet "Fortune 1 million". How? By wizardry in the form of a “datacenter operating system” created from a pinch of "statistical machine learning (SML)" and a tincture of "recent insights from networking and distributed systems." But like most magics it's not so outlandish once you understand it:

  • Virtual machines provide the OS mechanism.
  • SML enforces the overarching policy.
  • Tools collect sensor data from all the hardware and software components.
  • Actuators shutdown, reboot, or migrate services inside the datacenter.
  • Workload generators and application simulators to record behaviors of proprietary systems and then recreate them in a research environment.
  • Ruby on Rails is the likely programming language.
  • Chubby and MapReduce are the libraries.
  • Storage is via services like BigTable, Google File System, and Amazon’s Simple Storage Service.
  • Crash-only software design.
  • CAP (consistency, availability, partition-tolerance) based design strategies.
  • Improve the efficiency of power delivery and usage. The only new part would be the SML. All the rest is fairly standard by now, even if it's not yet available in a nice gift box at a discount store. And I am highly skeptical when people draw a big circle around the really tricky complex bits and say we'll solve all that with "statistical machine learning", but the idea is intriguing. The dramatic rise of cloud/utility computing makes the personal datacenter idea less appealing than it otherwise would have been. When datacenters were built from scratch by hardy settlers with nothing but flint knives and bear skins, a Datacenter OS would have been very exciting. But now, isn't leveraging multiple clouds a better strategy? After all, the DC OS really just packages best practices. It won't really innovate for you so you aren't gaining a competitive advantage or even a lower cost structure. And if that's the case, wouldn't I rather have someone else do all of the work? But I have high hopes I'll have my own personal power plant in the near future. Maybe one of the things it will power is my own personal datacenter!

    Related Articles

  • Home Page for RAD Lab - Reliable Adaptive Distributed Systems Laboratory
  • RADLab Technical Vision (2005)
  • CS 294-23, Software as a Service (Patterson/Fox/Sobel)
  • Internet-scale Computing: The Berkeley RADLab Perspective
  • CS 294-14: Architecture of Internet Datacenters. This a course at Berkeley and many classes have lecture notes. Very cool. PS Is it "datacenter" or "data center"? Both are used and it drives me crazy.

    Click to read more ...

  • Wednesday

    How many machines do you need to run your site?

    Amazingly TechCrunch runs their website on one web server and one database server, according to the fascinating survey What the Web’s most popular sites are running on by Pingdom, a provider of uptime and response time monitoring. Early we learned PlentyOfFish catches and releases many millions of hits a day on just 1 web server and three database servers. Google runs a Dalek army full of servers. YouSendIt, a company making it easy to send and receive large files, has 24 web servers, 3 database servers, 170 storage servers, and a few miscellaneous servers. Vimeo, a video sharing company, has 100 servers for streaming video, 4 web servers, and 2 database servers. Meebo, an AJAX based instant messaging company, uses 40 servers to handle messaging, over 40 web servers, and 10 servers for forums, jabber, testing, and so on. FeedBurner, a news feed management company, has 70 web servers, 15 database servers, and 10 miscellaneous servers. Now multiply FeedBurner's server count by two because they maintain two geographically separate sites, in an active-passive configuration, for high availability purposes. How many servers will you need and how can you trick yourself into using fewer?

    Find Someone Like You and Base Your Resource Estimates Off Them

    We see quite a disparity in the number of servers needed for popular web sites. It ranges from just a few servers to many hundreds. Where do you fit? The easiest approach to figuring out how many servers you'll need is to find a company similar to yours and look how many they need. You won't need that many right away, but as you grow it's something to think about. Can your data center handle your growth? Do they have enough affordable bandwidth and rack space? How will you install and manage all the machines? Who will do the work? And a million other similar questions that might be better handled if you had some idea where you are going.

    Get Someone Else to Do it

    Clearly content sites end up needing a lot of servers. Videos, music, pictures, blogs, and attachments all eat up space and since that's your business you have no alternative but to find a way to store all that data. This is unstructured data that can be stored outside the database in a SAN or NAS. Or, rather that building your own storage infrastructure, you can follow the golden rule of laziness: get someone else to do it. That's what SmugMug, an image sharing company did. They use S3 to store many hundreds of terabytes of data. This drops the expense of creating a large highly available storage infrastructure so much that it creates a whole new level of competition for content rich sites. At one time expertise in creating massive storage farms would have been enough to keep competition away, but no more. These sorts of abilities are becoming commoditized, affordable, and open. PlentyOfFish and YouTube make use of CDNs to reduce the amount of infrastructure they need to create for themselves. If you need to stream video why not let a CDN do it instead of building out your own expensive infrastructure? You can take a "let other people do it approach" for services like email, DNS, backup, forums, and blogs too. These are all now outsourcable. Does it make sense to put these services in your data center if you don't need to? If you have compute intensive tasks you can use Amazon services without needing to perform your own build out. And an approach I am really excited to investigate in the future is a new breed of grid based virtual private data centers like 3tera and mediatemple. Their claim to fame is that you can componetize your infrastructure in such a way that you can scale automatically and transparently using their grid as demand fluctuates. I don't have any experience with this approach yet, but it's interesting and probably where the world is heading. If your web site is relatively simple blog then with mostly static content then you can get away with far fewer servers. Even a popular site like Digg has only 30GB of data to store.

    How do your resources scale with the number of users?

    A question you have to ask also is do your resources scale linearly, exponentially, or not much at all with the number of users. A blog site may not scale much with the number of users. Some sites scale linearly as users are added. And others sites that rely on social interaction, like Google Talk, may scale exponentially as users are added. Getting a feel for the type of site you have can help more realistic numbers pop up on your magic server eight-ball.

    What's your caching strategy?

    A lot of sites use Memcached and Squid for caching. You can fill up a few racks with caching servers. How many servers will you need for caching? Or can you get away with just beefing up the database server cache?

    Do you need servers for application specific tasks?

    Servers aren't just for storage, database, and the web servers. You may have a bit of computation going on. YouTube offloads tag calculations to a server farm. GoogleTalk has to have servers for handling presence calculations. PlentyOfFish has servers to handle geographical searches because they are so resource intensive. GigaVox needs servers to transcode podcasts into different formats and include fresh commercial content. If you are a calendar service you may need servers to calculate more complicated schedule availability schemes and to sync address books. So depending on your site, you may have to budget for many application related servers like these. The Pingdom folks also created a sweet table on what technologies the companies profiled on this site are using. You can find it at What nine of the world’s largest websites are running on. I'm very jealous of their masterful colorful graphics-fu style. Someday I hope rise to that level of presentation skill.

    Click to read more ...


    What does the next generation data center look like?

    That's what people at the NGDC Conference get together and talk about. A lot of interesting subjects: data center virtualization HPC & grid; advanced facilitates management and planning; advanced network and services; applications; data center optimization and security; managing and protecting information. The Grid – Distributed Computing at Scale presentation is an interesting one.

    Click to read more ...