Update 2: a commenter in Twitter Fails Macworld Keynote Test said this entry needs to be updated. LOL. My uneducated guess is it's not a language or architecture problem, but more a problem of not being able to add hardware fast enough into their data center. The predictability of this problem is debatable, but once you have it, it's hard to fix.
Update: Twitter releases Starling - light-weight persistent queue server that speaks the MemCache protocol. It was built to drive Twitter's backend, and is in production across Twitter's cluster.
Twitter started as a side project and blew up fast, going from 0 to millions of page views within a few terrifying months. Early design decisions that worked well in the small melted under the crush of new users chirping tweets to all their friends. Web darling Ruby on Rails was fingered early for the scaling problems, but Blaine Cook, Twitter's lead architect, held Ruby blameless:
For us, it’s really about scaling horizontally - to that end, Rails and Ruby haven’t been stumbling blocks, compared to any other language or framework. The performance boosts associated with a “faster” language would give us a 10-20% improvement, but thanks to architectural changes that Ruby and Rails happily accommodated, Twitter is 10000% faster than it was in January.
If Ruby on Rails wasn't to blame, how did Twitter learn to scale ever higher and higher?
Update: added slides Small Talk on Getting Big. Scaling a Rails App & all that Jazz
Site: http://twitter.com
Information Sources
Scaling Twitter Video by Blaine Cook.
Scaling Twitter Slides
Good News blog post by Rick Denatale
Scaling Twitter blog post Patrick Joyce.
Twitter API Traffic is 10x Twitter’s Site.
A Small Talk on Getting Big. Scaling a Rails App & all that Jazz - really cute dog picks
The Platform
Ruby on Rails
Erlang
MySQL
Mongrel - hybrid Ruby/C HTTP server designed to be small, fast, and secure
Munin
Nagios
Google Analytics
AWStats - real-time logfile analyzer to get advanced statistics
Memcached
The Stats
Over 350,000 users. The actual numbers are as always, very super super top secret.
600 requests per second.
Average 200-300 connections per second. Spiking to 800 connections per second.
MySQL handled 2,400 requests per second.
180 Rails instances. Uses Mongrel as the "web" server.
1 MySQL Server (one big 8 core box) and 1 slave. Slave is read only for statistics and reporting.
30+ processes for handling odd jobs.
8 Sun X4100s.
Process a request in 200 milliseconds in Rails.
Average time spent in the database is 50-100 milliseconds.
Over 16 GB of memcached.
The Architecture
Ran into very public scaling problems. The little bird of failure popped up a lot for a while.
Originally they had no monitoring, no graphs, no statistics, which makes it hard to pinpoint and solve problems. Added Munin and Nagios. There were difficulties using tools on Solaris. Had Google analytics but the pages weren't loading so it wasn't that helpful :-)
Use caching with memcached a lot.
- For example, if getting a count is slow, you can memoize the count into memcache in a millisecond.
- Getting your friends status is complicated. There are security and other issues. So rather than doing a query, a friend's status is updated in cache instead. It never touches the database. This gives a predictable response time frame (upper bound 20 msecs).
- ActiveRecord objects are huge so that's why they aren't cached. So they want to store critical attributes in a hash and lazy load the other attributes on access.
- 90% of requests are API requests. So don't do any page/fragment caching on the front-end. The pages are so time sensitive it doesn't do any good. But they cache API requests.
Messaging
- Use message a lot. Producers produce messages, which are queued, and then are distributed to consumers. Twitter's main functionality is to act as a messaging bridge between different formats (SMS, web, IM, etc).
- Send message to invalidate friend's cache in the background instead of doing all individually, synchronously.
- Started with DRb, which stands for distributed Ruby. A library that allows you to send and receive messages from remote Ruby objects via TCP/IP. But it was a little flaky and single point of failure.
- Moved to Rinda, which a shared queue that uses a tuplespace model, along the lines of Linda. But the queues are persistent and the messages are lost on failure.
- Tried Erlang. Problem: How do you get a broken server running at Sunday Monday with 20,000 users waiting? The developer didn't know. Not a lot of documentation. So it violates the use what you know rule.
- Moved to Starling, a distributed queue written in Ruby.
- Distributed queues were made to survive system crashes by writing them to disk. Other big websites take this simple approach as well.
SMS is handled using an API supplied by third party gateway's. It's very expensive.
Deployment
- They do a review and push out new mongrel servers. No graceful way yet.
- An internal server error is given to the user if their mongrel server is replaced.
- All servers are killed at once. A rolling blackout isn't used because the message queue state is in the mongrels and a rolling approach would cause all the queues in the remaining mongrels to fill up.
Abuse
- A lot of down time because people crawl the site and add everyone as friends. 9000 friends in 24 hours. It would take down the site.
- Build tools to detect these problems so you can pinpoint when and where they are happening.
- Be ruthless. Delete them as users.
Partitioning
- Plan to partition in the future. Currently they don't. These changes have been enough so far.
- The partition scheme will be based on time, not users, because most requests are very temporally local.
- Partitioning will be difficult because of automatic memoization. They can't guarantee read-only operations will really be read-only. May write to a read-only slave, which is really bad.
Twitter's API Traffic is 10x Twitter’s Site
- Their API is the most important thing Twitter has done.
- Keeping the service simple allowed developers to build on top of their infrastructure and come up with ideas that are way better than Twitter could come up with. For example, Twitterrific, which is a beautiful way to use Twitter that a small team with different priorities could create.
Monit is used to kill process if they get too big.
Lessons Learned
Talk to the community. Don't hide and try to solve all problems yourself. Many brilliant people are willing to help if you ask.
Treat your scaling plan like a business plan. Assemble a board of advisers to help you.
Build it yourself. Twitter spent a lot of time trying other people's solutions that just almost seemed to work, but not quite. It's better to build some things yourself so you at least have some control and you can build in the features you need.
Build in user limits. People will try to bust your system. Put in reasonable limits and detection mechanisms to protect your system from being killed.
Don't make the database the central bottleneck of doom. Not everything needs to require a gigantic join. Cache data. Think of other creative ways to get the same result. A good example is talked about in Twitter, Rails, Hammers, and 11,000 Nails per Second.
Make your application easily partitionable from the start. Then you always have a way to scale your system.
Realize your site is slow. Immediately add reporting to track problems.
Optimize the database.
- Index everything. Rails won't do this for you.
- Use explain to how your queries are running. Indexes may not be being as you expect.
- Denormalize a lot. Single handedly saved them. For example, they store all a user IDs friend IDs together, which prevented a lot of costly joins.
- Avoid complex joins.
- Avoid scanning large sets of data.
Cache the hell out of everything. Individual active records are not cached, yet. The queries are fast enough for now.
Test everything.
- You want to know when you deploy an application that it will render correctly.
- They have a full test suite now. So when the caching broke they were able to find the problem before going live.
Long running processes should be abstracted to daemons.
Use exception notifier and exception logger to get immediate notification of problems so you can address the right away.
Don't do stupid things.
- Scale changes what can be stupid.
- Trying to load 3000 friends at once into memory can bring a server down, but when there were only 4 friends it works great.
Most performance comes not from the language, but from application design.
Turn your website into an open service by creating an API. Their API is a huge reason for Twitter's success. It allows user's to create an ever expanding and ecosystem around Twitter that is difficult to compete with. You can never do all the work your user's can do and you probably won't be as creative. So open you application up and make it easy for others to integrate your application with theirs.
Related Articles
For a discussion of partitioning take a look at Amazon Architecture, An Unorthodox Approach to Database Design : The Coming of the Shard, Flickr Architecture
The Mailinator Architecture has good strategies for abuse protection.
GoogleTalk Architecture addresses some interesting issues when scaling social networking sites.
Comments
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
Todd, thanks for the excellent research u did on twitter. Its amazing that the entire Twitter infrastructure is running with just one rw database. Would be interesting to find out the usage stats on that single box...
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
Loved your article, it echoes a lot of themes I've been talking about for awhile on my blog, so I wrote about the Twitter case based on your article here:
http://smoothspan.wordpress.com/2007/09/14/twitter-scaling-story-mirrors...
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
I wonder what the RoR haters will make up now to say that ruby doesn't scale.
They loved jumping on the ruby hate bandwagon when twitter was going through it's difficulties. Little bo beep has been quite silent since.
Caching was the answer? Shock. Gasp. Awe. Just like PHP?!? Crazy!
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
I think you're referring to Starfish, not Starling.
Great article!
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
No, its not Starfish. In the video of his presentation, he mentions "so I wrote Starling..."
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
great article (and site) Todd. thanks for pulling all this information together. It's a great resource
ps. @Dave: Blaine referred to his 'starling' messaging framework at the SJ Ruby Conference earlier in the year.
They could have been 20% better?
So, let's be clear, the biased source in defense mode says themselves they could have been 20% faster just by selecting a different language (note that it doesn't exactly say what the performance hit of the Rails framework itself is, so let's just go with 20% improvement by changing languages and ignore potential problems in (1) their coding decisions and (2) their chosen framework).... Wow, sign me up for an easy 20% improvement!
Yeah, yeah, I know, I'll hear the usual tripe about how amazing fast Ruby is to develop with. Visual Basic is pretty easy too, as is PHP, but I don't use those either.
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
Sounds like Ruby on Rails _was_ to blame as the 10000 percent improvement was reached by essentially removing the "on rails" part of the equation by extensive caching. This seems to be the real weakness of RoR; Ruby in itself seems OK performance-wise, slower than PHP for example but not catastrophically so. PHP is slower than Java but scales nicely anyway. The database abstraction in "on rails" is a real performance killer though and all the high traffic sites that use RoR successfully (twitter, penny arcade, ...) seems to have taken steps to avoid using the database abstraction on live page views by extensive caching.
Of course, caching is a necessary tool for scaling regardless of the platform but with a less inefficient abstraction layer than the one in RoR it is possible to grow more before you have to recode stuff for caching.
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
Excellent article.
I agree with one of the other commenters that it's surprising they have this running from a single MySQL server. Wow. The fact that twitter tends to be very write-heavy, and MySQL isn't exactly perfect for multimaster replication architectures probably has a lot to do with that. I wonder what they are planning to do for future growth? Obviously this will not continue to work as-is..
--
Dustin Puryear
Author, "Best Practices for Managing Linux and UNIX Servers"
http://www.puryear-it.com/pubs/linux-unix-best-practices
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
I like the comment were the speed of the language isn't anywhere as important as the scalability of the language.
Moore's Law of computer speed will eventually come to an end. Parallelism will take over and any language that can thrive in that regard will work.
Twitter is proof. 0-millions in months??
And exactly how long was Twitter down when they were having their scaling problems? Weeks? I don't think so.
It scaled...and is scaling.
cbmeeks
http://cbmeeks.blogspot.com/
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
This was a very interesting read. I wonder if/when the Twitter people will upgrade to the new 2.0 of Rails and if so, I wonder how that will affect performance??
http://codershangout.com
A place for coders to hangout!
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
Thanks! a lot of helpful links are useful and useful to me in the future!
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
"Of course, caching is a necessary tool for scaling regardless of the platform but with a less inefficient abstraction layer than the one in RoR it is possible to grow more before you have to recode stuff for caching."
Most of this post went to great pains to show that the 20% or so language inefficiency consequence of Twitter's choice of Ruby was easily made up for by the architecture that it enabled easily. But the commenter's point is valid that the Rails part of their Ruby architecture made it harder for them to scale easily without a code rewrite. But who cares, Ruby on Rails still seems to encourage smart, DIY programming, and as the analysis in the blog post pointed out Twitter proved this by writing their own queueing system called Starling in under 200 lines of Ruby that handles all their pub/sub needs.
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
The difficulty is that the carriers that allow their customers to recharge prepaid cards take our money to do so; in effect, Twitter (and any other service that offers free delivery of SMS messages) becomes a source of free money. It's inherently unsustainable.
More generally, the point of this slide was that it's not a good scaling practice to allow "abusive" users to undermine continued access to "legitimate" users (and that the definition of both of those terms is subject to your own particular situation).
There's always room for creativity - until we're able to deal directly with Italian carriers to ensure that we don't act as a prepaid card refill service, Italian users are able to send messages via SMS, and are able to receive messages via AIM or the Mobile Web (and soon Email as well).
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
thanks for pulling all this information together. It's a great resource
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
The only thing which haven't become clear for me is how in fact they are handling their external API calls.
Yes, that's said that it generates lots of traffic, but the exact process of performing all that action isn't so obvious...
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
great article (and site) Todd. thanks for pulling all this information together. It's a great resource
ps. @Dave: Blaine referred to his 'starling' messaging framework at the SJ Ruby Conference earlier in the year.
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
Nice! I've bookmarked it http://www.searchallinone.com/Other/Rays_Report_-_Marc_Lancaster_Sports_... :D
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
I like the comment were the speed of the language isn't anywhere as important as the scalability of the language.
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
No, its not Starfish. In the video of his presentation, he mentions "so I wrote Starling..."
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
No, its not Starfish. In the video of his presentation, he mentions "so I wrote Starling..."
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
No, its not Starfish. In the video of his presentation, he mentions "so I wrote Starling..."
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
No, its not Starfish. In the video of his presentation, he mentions "so I wrote Starling wonderful
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
No, its not Starfish. In the video of his presentation thnks
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
very nice pulling all this information together. It's a great resource
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
I think you're referring to Starfish, not Starling.
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
No, its not Starfish. In the video of his presentation, he mentions "so I wrote Starling one
Re: Scaling Twitter: Making Twitter 10000 Percent Faster
The subject of a very wonderful and distinct
I thank you for continuing excellence
Thank you
ضحك,
ليبيا,
شباب ليبيا,
احاديث نبوية,
السيرة النبوية,
برامج اسلامية,
صوتيات اسلامية,
خواطر,
غرائب وعجائب,
الشعر الشعبى,
قصص,
اللغات الاجنبية,
تعلم الفرنسية,
تعلم الانجليزية,
الطب
تقنية الاسنان,
كتب طبية,
طب الاعشاب,
اناقة وجمال,
اناقة الرجال,
الاسرة والمجتمع,
الطبخ,
اثاث وديكور,
مقاطع كورة,
الدوري الليبي,
المصارعة,
الكرة العربية,
الكرة العالمية,
رسائل خيانة
مسجات عادية
منتديات,
الدوري الاسباني,
الدوري الانجليزي,
الدوري الايطالي,
اخبار المشاهير,
افلام اجنبية,
مسلسلات اجنبية,
تحميل افلام,
افلام عربية,
تحميل مسلسلات عربية,
افلام كرتون,
برامج,
برامج الفيديو,
اخبار التكنولوجيا,
شبكات الحاسوب,
تطوير المواقع,
تطوير المنتديات,
محادثة,
صور,
الفوتوشوب,
برامج الفوتوشوب,
التصميم,
برامج الجوال,
كليبات جوال,
نغمات نوكيا,
نغمات عربية,
نغمات اسلامية,
العاب الجوال,
مسجات,
القنوات الفضائية,
مسجات عن الغياب
تحميل برامج
free Download
مسلسلات,
تحميل برامج مجانية
تحميل افلام اجنبية,
شفرات,
كروت الساتلايت,
الرسيفرات,
خلفيات للجوال,
نغمات اجنبية,
برامج الجوال الجيل الثالث,
ترحيب
games
Make Money
مسجات ليبية
مسجات رومانسية
مسجات روعة
مسجات حب
مسجات عتاب
مسجات جديدة
مسجات ليبية
رسائل رومانسية
مسجات مصرية
مسجات سعودية
مسجات اماراتية
مسجات بحرينية
مسجات تونسية
مسجات سورية
مسجات شبابية
مسجات مغربية
مسجات جزائرية
مسجات سودانية
مسجات قطرية
مسجات لبنانية
مسجات عن الامتحانات
مسجات حب
مسجات غزل
رسائل شوق
مسجات شوق
مسجات فراق
مسجات غرام
مسجات صداقة
مسجات حزن
مسجات عشق
مسجات مقالب
مسجات جوال
مسجات وداع
مسجات اسلامية
مسجات مدح
رسائل الوسائط
مسجات عاطفية
مسجات نقال
مسجات صلح
مسجات انجليزية
مسجات تسامح
مسجات الصباح
مسجات نكت
مسجات ضحك
مسجات حلوة
مسجات الحزن
مسجات موبايل
مسجات الجوال
مسجات للموبايل
مسجات للحبيبة
مسجات كبرياء
مسجات عتاب
C.Ronaldo
باب الحارة
سنوات الضياع
نكت
مسلسل نور
prison break
messi
Kaka
منتدى الكساد
منتديات الكساد
توم وجيري
عدنان ولينا
Youtube
AlWafi
Office
تعلم الفوتوسوب
Photoshop
نقار الخشب
هايدي
كونان
الجاسوسات
سبيستون
kaspersky
مسجات يمنية
مسجات عراقية
مسجات فلسطينية
مسجات العيد
مسجات ليبية
مسجات غرامية
مسجات فراق
مسجات كويتية
رسائل غزل
Post new comment