advertise
Monday
Feb132012

Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter

With over 15 billion page views a month Tumblr has become an insanely popular blogging platform. Users may like Tumblr for its simplicity, its beauty, its strong focus on user experience, or its friendly and engaged community, but like it they do.

Growing at over 30% a month has not been without challenges. Some reliability problems among them. It helps to realize that Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.

One of the common patterns across successful startups is the perilous chasm crossing from startup to wildly successful startup. Finding people, evolving infrastructures, servicing old infrastructures, while handling huge month over month increases in traffic, all with only four engineers, means you have to make difficult choices about what to work on. This was Tumblr’s situation. Now with twenty engineers there’s enough energy to work on issues and develop some very interesting solutions.

Tumblr started as a fairly typical large LAMP application. The direction they are moving in now is towards a distributed services model built around Scala, HBase, Redis, Kafka, Finagle,  and an intriguing cell based architecture for powering their Dashboard. Effort is now going into fixing short term problems in their PHP application, pulling things out, and doing it right using services.

The theme at Tumblr is transition at massive scale. Transition from a LAMP stack to a somewhat bleeding edge stack. Transition from a small startup team to a fully armed and ready development team churning out new features and infrastructure. To help us understand how Tumblr is living this theme is startup veteran Blake Matheny, Distributed Systems Engineer at Tumblr. Here’s what Blake has to say about the House of Tumblr:

Click to read more ...

Friday
Feb102012

Stuff The Internet Says On Scalability For February 10, 2012

HighScalability Tested, Mother Approved:

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Tuesday
Feb072012

Hypertable Routs HBase in Performance Test -- HBase Overwhelmed by Garbage Collection

This is a guest post by Doug Judd, original creator of Hypertable and the CEO of Hypertable, Inc.

Hypertable delivers 2X better throughput in most tests -- HBase fails 41 and 167 billion record insert tests, overwhelmed by garbage collection -- Both systems deliver similar results for random read uniform test

We recently conducted a test comparing the performance of Hypertable (@hypertable) version 0.9.5.5 to that of HBase (@HBase) version 0.90.4 (CDH3u2) running Zookeeper 3.3.4.  In this post, we summarize the results and offer explanations for the discrepancies. For the full test report, see Hypertable vs. HBase II.

Introduction

Hypertable and HBase are both open source, scalable databases modeled after Google's proprietary Bigtable database.  The primary difference between the two systems is that Hypertable is written in C++, while HBase is written in Java.  We modeled this test after the one described in section 7 of the Bigtable paper and tuned both systems for maximum performance.  The test was run on a total of sixteen machines connected together with gigabit Ethernet.  The machines had the following configuration:

Click to read more ...

Monday
Feb062012

The Design of 99designs - A Clean Tens of Millions Pageviews Architecture

99designs is a crowdsourced design contest marketplace based out of Melbourne Australia. The idea is that if you have a design you need created you create a contest and designers compete to give you the best design within your budget.

If you are a medium sized commerce site this is a clean example architecture of a site that reliably supports a lot of users and a complex workflow on the cloud. Lars Yencken wrote a nicely written overview of the architecture behind 99designs in Infrastructure at 99designs. Here's a gloss on their architecture:

Stats

Click to read more ...

Friday
Feb032012

Stuff The Internet Says On Scalability For February 3, 2012

I'm only here for the HighScalability:

  • 762 billion: objects stored on S3; $1B/Quarter: Google spend on servers; 100 Petabytes: Storage for Facebook's photos and videos.
  • Quotable Quotes:
    • @knorth2: #IPO filing says #Facebook is "dependent on our ability to maintain and scale our technical infrastructure"
    • @debuggist: Scalability trumps politics.
    • @cagedether: Hype of #Hadoop is driving pressure on people to keep everything
    • @nanreh: My MongoDB t shirt has never helped me get laid. This is typical with #nosql databases.
    • @lusis: I kenna do it, Capt'n. IO is pegged, disk is saturated…I lost 3 good young men when the cache blew up!
    • Kenton Varda: Jeff Dean puts his pants on one leg at a time, but if he had more than two legs, you'd see that his approach is actually O(log n)
  • One upon a time manufacturing located near rivers for power. Likewise software will be located next to storage, CPU, and analytics resources in a small cartel of clouds. That's the contention of Here Come the Cloud Cartels. This tributary system (pun intended) will be Amazon, Cisco Systems, Google, I.B.M., Microsoft, Oracle and a few competitors. Supposedly the benefit will be cheap computing, but when has a cartel ever lead to cheap anything?
Don't miss all that the Internet has to say on Scalability, click below or stay ignorant...

Click to read more ...

Thursday
Feb022012

The Data-Scope Project - 6PB storage, 500GBytes/sec sequential IO, 20M IOPS, 130TFlops

Data is everywhere, never be at a single location. Not scalable, not maintainable.–Alex Szalay

While Galileo played life and death doctrinal games over the mysteries revealed by the telescope, another revolution went unnoticed, the microscope gave up mystery after mystery and nobody yet understood how subversive would be what it revealed. For the first time these new tools of perceptual augmentation allowed humans to peek behind the veil of appearance. A new new eye driving human invention and discovery for hundreds of years.

Data is another material that hides, revealing itself only when we look at different scales and investigate its underlying patterns. If the universe is truly made of information, then we are looking into truly primal stuff. A new eye is needed for Data and an ambitious project called Data-scope aims to be the lens.

A detailed paper on the Data-Scope tells more about what it is:

The Data-Scope is a new scientific instrument, capable of ‘observing’ immense volumes of data from various scientific domains such as astronomy, fluid mechanics, and bioinformatics. The system will have over 6PB of storage, about 500GBytes per sec aggregate sequential IO, about 20M IOPS, and about 130TFlops. The Data-Scope is not a traditional multi-user computing cluster, but a new kind of instrument, that enables people to do science with datasets ranging between 100TB and 1000TB.  There  is a vacuum today in data-intensive scientific computations, similar to the one that lead to the development of the BeoWulf cluster: an inexpensive yet efficient template for data intensive computing in academic environments based on commodity components. The proposed Data-Scope aims to fill this gap.

A very accessible interview by Nicole Hemsoth with Dr. Alexander Szalay, Data-Scope team lead, is available at The New Era of Computing: An Interview with "Dr. Data". Roberto Zicari also has a good interview with Dr. Szalay in Objects in Space vs. Friends in Facebook.

The paper is filled with lots of very specific recommendations on their hardware choices and architecture, so please read the paper for the deeper details. Many BigData operations have the same IO/scale/storage/processing issues Data-Scope is solving, so it’s well worth a look. Here are some of the highlights:

Click to read more ...

Tuesday
Jan312012

Performance in the Cloud: Business Jitter is Bad

 

biz jitter

One of the benefits of web applications is that they are generally transported via TCP, which is a connection-oriented protocol designed to assure delivery. TCP has a variety of native mechanisms through which delivery issues can be addressed – from window sizes to selective acks to idle time specification to ramp up parameters. All these technical knobs and buttons serve as a way for operators and administrators to tweak the protocol, often at run time, to ensure the exchange of requests and responses upon which web applications rely. This is unlike UDP, which is more of a “fire and forget” protocol in which the server doesn’t really care if you receive the data or not.

Click to read more ...

Tuesday
Jan312012

Sponsored Post: aiCache, Next Big Sound, ElasticHosts, Red 5 Studios, Attribution Modeling, Logic Monitor, New Relic, AppDynamics, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

  • Anybody interested in helping manage a 100+ Linux server deployment? Next Big Sound is a an analytics company for the music industry and is looking someone to help them scale.
  • Red 5 Studios. Wanted: DBAs and Programmers interested in MySQL scalability and replication. If interested, please see us here

Fun and Informative Events

  • Sign up for this free 30-minute webinar exploring how new technology can determine which ads have been seen by users and will discuss the C3 Metrics Labs analysis of over 2 billion impressions. 

Cool Products and Services

  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site.
  • ElasticHosts award winning cloud server hosting launches across North America. Adding data centers in Los Angeles and Toronto. Free trial.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • New Relic - real user monitoring optimize for humans, not bots. Live application stats, SQL/NoSQL performance, web transactions, proactive notifications. Take 2 minutes to sign up for a free trial.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • CloudSigma. Utility style high performance cloud servers in the US and Europe delivered on all 10GigE networking. Run any OS, take advantage of SSD storage and tailored infrastructure options.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

For a longer description of each sponsor, please read more below...

Click to read more ...

Monday
Jan302012

37signals Still Happily Scaling on Moore RAM and SSDs

There are so many architectural ideas swirling in the bit wind these days. Two of the biggest battles are cloud vs. bare metal and RAM vs. disk vs. SSD. 37signals has published two solid articles that are counter hype cycle in their message:

Technologists who grew up when RAM cost $1,000 per megabyte can have a hard time dealing with the luxury of RAM being virtually free.

The progress of technology is throwing an ever greater number of optimizations into the “premature evil” bucket never to be seen again.

37signals made quite a stir with their money shot of the 864GB of RAM they bought for a mere $12K as part of their caching layer for Basecamp. That's a lot of memory for not a lot of money. There's nothing like actually seeing it in the flesh to bring the point home. Does that make Memory Based Architectures a little more appealing?

37signals then followed up with another provocative article: Three years later, Mr. Moore is still letting us punt on database sharding. The gist is scaling up is working for them. RAM is getting cheaper and FusionIO is getting faster, so they've been able to avoid architecture complexifications like sharding. Does that make SSD based architectures a little more appealing?

StackExchange is in much the same position, with a different stack, but with sympatico core ideas and comparable results. The learning: In your transaction oriented features, if you aren't Googleish in your requirements, then scale-up using bare metal, RAM, and SSD may be the way to go. The tug you feel towards the cloud and horizontal scaling may just be a strong consensus wind a blowin'.

Some of the key takeways are: 

Click to read more ...

Friday
Jan272012

Stuff The Internet Says On Scalability For January 27, 2012

If you’ve got the time, we’ve got the HighScalability:

  • 9nm : IBM's carbon nanotube transistor that outperforms silicon; YouTube: 4 Billion Views/Day; 864GB RAM: 37signals Memcache, $12K
  • Quotable Quotes:
    • Chad Dickerson: You can only get growth by feeding opportunities.
    • @launchany: It amazes me how many NoSQL database vendors spend more time detailing their scalability and no time detailing the data model and design
    • Google: Let's make TCP faster.
    • WhatsApp: we are now able to easily push our systems to over 2 million tcp connections!
    • Sidney Dekker: In a complex system…doing the same thing twice will not predictably or necessarily lead to the same results.
    • @Rasmusfjord: Just heard about an Umbraco site running on Azure that handles 20.000 requests /*second*
  • Herb Sutter with an epic post, Welcome to the Jungle, touching on a lot of themes we've explored on HighScalability, only in a dramatically more competent way. What's after the current era of multi-core CPUs has played out? Mainstream computers from desktops to ‘smartphones’ are being permanently transformed into heterogeneous supercomputer clusters. Henceforth, a single compute-intensive application will need to harness different kinds of cores, in immense numbers, to get its job done. Different parts of even the same application naturally want to run on different kinds of cores. Applications will need to be at least massively parallel, and ideally able to use non-local cores and heterogeneous cores. Programming languages and systems will increasingly be forced to deal with heterogeneous distributed parallelism. Perhaps our most difficult mental adjustment, however, will be to learn to think of the cloud as part of the mainstream machine – to view all these local and non-local cores as being equally part of the target machine that executes our application, where the network is just another bus that connects us to more cores. If you haven’t done so already, now is the time to take a hard look at the design of your applications, determine what existing features – or, better still, what potential and currently-unimaginable demanding new features – are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit from local and distributed parallelism. Now is also the time for you and your team to grok the requirements, pitfalls, styles, and idioms of hetero-parallel (e.g., GPGPU) and cloud programming.
There's so much more the Internet has to say on Scalability. Click below to be in on all the secrets...

Click to read more ...