MySQL

Todd Hoff's picture

37signals Architecture

Update 6: Things We’ve Learned at 37Signals. Themes: less is more; don't worry be happy.
Update 5: Nuts & Bolts: HAproxy . Nice explanation (post, screencast) by Mark Imbriaco of why HAProxy (load balancing proxy server) is their favorite (fast, efficient, graceful configuration, queues requests when Mongrels are busy) for spreading dynamic content between Apache web servers and Mongrel application servers.
Update 4: O'Rielly's Tim O'Brien interviews David Hansson, Rails creator and 37signals partner. Says BaseCamp scales horizontally on the application and web tier. Scales up for the database, using one "big ass" 128GB machine. Says: As technology moves on, hardware gets cheaper and cheaper. In my mind, you don't want to shard unless you positively have to, sort of a last resort approach.
Update 3: The need for speed: Making Basecamp faster. Pages now load twice as fast, cut CPU usage by a third and database time by about half. Results achieved by: Analysis, Caching, MySQL optimizations, Hardware upgrades.
Update 2: customer support is handled in real-time using Campfire.
Update: highly useful information on creating a customer billing system.

In the giving spirit of Christmas the folks at 37signals have shared a bit about how their system works. 37signals is most famous for loosing Ruby on Rails into the world and they've use RoR to make their very popular Basecamp, Highrise, Backpack, and Campfire products. RoR takes a lot of heat for being a performance dog, but 37signals seems to handle a lot of traffic with relatively normal sounding resources. This is just an initial data dump, they promise to add more details later. As they add more I'll update it here.

Todd Hoff's picture

Digg Architecture

Update 2:: How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information.
Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades.

Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of requests a month?

Shard servers -- go big or small?

Hello everyone,

I'm designing a website/widget that my business partner and I expect to serve millions of hits daily. As such we must shard our database (and we're designing with shards in mind right from the beginning). However, the one thing I haven't been able to figure out from Googling is the best hardware to go with for shards. I'm using exclusively InnoDB tables.

We'll (eventually) be running 3 groups of database servers:
a) Session servers for php sessions. These will have a very high write volume.
b) ID servers. These will match a couple primary indices (such as user ID) to a given shard. These will have an intense read load, plus a moderate amount of writes.
c) Shard servers. These will hold the bulk of the data. These will have a high read load and a lowish write load.

Group A is done as a database instead of using memcached so users aren't logged out if a memcached server goes down. As the write load is high, a pair of high performance master-master servers seems obvious. What's the ideal hardware setup for machines with this role? Maxed RAM and fast disks seem reasonable. Should I bother with RAID > 0 if I have a live backup on the other master? I hear 4 cores is optimal for InnoDB -- recommendations?

Group B. Again, it looks like maxed RAM is recommended here. What about disks? Should I go for 10K or will regular SATA2 drives be okay? RAID 0, 5, 10? Cores? Should I think about slaves to a master-master setup?

Group C. It seems to me these machines can be of any capacity because the data they hold is easily spread between shards. What is the query-per-second per dollar sweet spot when it comes to cores and number of disks? Should I beef these machines up, or stick with low end hardware? Should I still max the RAM?

I have some other thoughts on system setup, too. As the data stored in the PHP sessions won't change frequently (it'll likely remain static for a user's entire visit -- all variable data can be stored in Group C shard servers), I'm thinking of using a memcached setup in front of the database and only pushing writes through to the database when necessary. Your thoughts?

We're also starting this on a minimal budget (of course), so where in the above is it best spent? Keep in mind that I can recycle machines used in Group A & B in Group C as times goes on.

Anyway, I'd love to hear from the expertise of the forum. I've been reading for a long time, and I'll be writing as our project evolves :)

--Mark

Todd Hoff's picture

Product: Tungsten Replicator

With Tungsten Replicator Continuent is trying to deliver a better master/slave replication system. Their goal: scalability, reliability with seamless failover, no performance loss.

From their website:
The Tungsten Replicator implements open source database-neutral master/slave replication. Master/slave replication is a highly flexible technology that can solve a wide variety of problems including the following:

* Availability - Failing over to a slave database if your master database dies
* Performance Scaling - Spreading reads across many copies of data
* Cross-Site Clustering - Maintaining active database replicas across WANs
* Change Data Capture - Extracting changes to load data warehouses or update other systems
* Zero Downtime Upgrade - Performing upgrades on a slave server which then becomes the master

The Tungsten Replicator architecture is flexible and designed to support addition of new databases easily. It includes pluggable extractor and applier modules to help transfer data from master to slave.

Todd Hoff's picture

Strategy: Drop Memcached, Add More MySQL Servers

Update 2: Michael Galpin in Cache Money and Cache Discussions likes memcached for it's expiry policy, complex graph data, process data, but says MySQL has many advantages: SQL, Uniform Data Access, Write-through, Read-through, Replication, Management, Cold starts, LRU eviction.
Update: Dormando asks Should you use memcached? Should you just shard mysql more?. The idea of caching is the most important part of caching as it transports you beyond a simple CRUD worldview. Plan for caching and sharding by properly abstracting data access methods. Brace for change. Be ready to shard, be ready to cache. React and change to what you push out which is actually popular, vs over planning and wasting valuable time.

Feedster's François Schiettecatte wonders if Fotolog's 21 memcached servers wouldn't be better used to further shard data by adding more MySQL servers? He mentions Feedster was able to drop memcached once they partitioned their data across more servers. The algorithm: partition until all data resides in memory and then you may not need an additional memcached layer.

Parvesh Garg goes a step further and asks why people think they should be using MySQL at all?

Related Articles


  • The Death of Read Replication by Brian Aker. Caching layers have replaced read replication. Cache can't fix a broken database layer. Partition the data that feeds the cache tier: "Keep your front end working through the cache. Keep all of your data generation behind it."
  • Read replication with MySQL by François Schiettecatte. Read replication is dead and it should be used only for backup purposes. Take the memory used for caching and give it to your database servers.
  • Replication++, Replication 2.0, Replication.Next by Ronald Bradford. What should read replication be used for?
  • Replication, caching, and partitioning by Greg Linden. Caching overdone because it adds complexity, latency on a cache miss, and inefficiently uses cluster resources. Hitting disk is the problem. Shard more and get your data in memory.

  • Todd Hoff's picture

    A Bunch of Great Strategies for Using Memcached and MySQL Better Together

    The primero recommendation for speeding up a website is almost always to add cache and more cache. And after that add a little more cache just in case. Memcached is almost always given as the recommended cache to use. What we don't often hear is how to effectively use a cache in our own products. MySQL hosted two excellent webinars (referenced below) on the subject of how to deploy and use memcached. The star of the show, other than MySQL of course, is Farhan Mashraqi of Fotolog. You may recall we did an earlier article on Fotolog in Secrets to Fotolog's Scaling Success, which was one of my personal favorites.

    Fotolog, as they themselves point out, is probably the largest site nobody has ever heard of, pulling in more page views than even Flickr. Fotolog has 51 instances of memcached on 21 servers with 175G in use and 254G available. As a large successful photo-blogging site they have very demanding performance and scaling requirements. To meet those requirements they've developed a sophisticated approach to using memcached that others can learn from and emulate. We'll cover some of the highlightable strategies from the webinar down below the fold.

    Todd Hoff's picture

    Scaling Bumper Sticker: A 1 Billion Page Per Month Facebook RoR App

    Several months ago I attended a Joyent presentation where the spokesman hinted that Joyent had the chops to support a one billion page per month Facebook Ruby on Rails application. Even under a few seconds of merciless grilling he would not give up the name of the application. Now we have the big reveal: it was LinkedIn's Bumper Sticker app. For those not currently sticking things on bumps, Bumper Sticker is quite surprisingly a viral media sharing application that allows users to express their individuality by sticking small virtual stickers on Facebook profiles. At the time I was quite curious how Joyent's cloud approach could be leveraged for this kind of app. Now that they've released a few details, we get to find out.

    Pyshards aspires to build sharding toolkit for Python

    I've been interested in sharding concepts since first hearing the term "shard" a few years back. My interest had been piqued earlier, the first time I read about Google's original approach to distributed search. It was described as a hashtable-like system in which independent physical machines play the role of the buckets. More recently, I needed the capacity and performance of a Sharded system, but did not find helpful libraries or toolkits which would assist with the configuration for my language of preference these days, which is Python. And, since I had a few weeks on my hands, I decided I would begin the work of creating these tools.

    The result of my initial work the Pyshards project, a still-incomplete python and MySQL based horizontal partitioning and sharding toolkit. HighScalability.com readers will already know that horizontal partitioning is a data segmenting pattern in which distinct groups of physical row-based datasets are distributed across multiple partitions. When the partitions exist as independent databases and when they exist within a shared-nothing architecture they are known as shards. (Google apparently coined the term shard for such database partitions, and pyshards has adopted it.) The goal is to provide big opportunities for database scalability while maintaining good performance. Sharded datasets can be queried individually (one shard) or collectively (aggregate of all shards). In the spirit of The Zen of Python, Pyshards focuses on one obvious way to accomplish horizontal partitioning, and that is by using a hash/modulo based algorithm.

    Pyshards provides the ability to reasonably add polynomial capacity (number of original shards squared) without re-balancing (re-sharding). Pyshards is designed with re-sharding in mind (because the time will come when you must re-balance) and provides re-sharding algorithms and tools. Finally, Pyshards aspires to provide a web-based shard monitoring tool so that you can keep an eye on resource capacity.

    So why publish an incomplete open source project? I'd really prefer to work with others who are interested in this topic instead of working in a vacuum. If you are curious, or think you might want to get involved, come visit the project page, join a mailing list, or add a comment on the WIKI.

    http://code.google.com/p/pyshards/wiki/Pyshards

    Devin

    Webinar: Designing and Implementing Scalable Applications with Memcached and MySQL

    The following technical Webinar could be of interest to the community.

    WHO:

    • Farhan "Frank" Mashraqi, Director of Business Operations and Technical Strategy, Fotolog Inc
    • Monty Taylor, Senior Consultant, Sun Microsystems
    • Jimmy Guerrero, Sr Product Marketing Manager, Sun Microsystems - Database Group

    WHAT:

    • Designing and Implementing Scalable Applications with Memcached and MySQL web presentation.

    WHEN:

    • Thursday, May 29, 2008, 10:00 am PST, 1:00 pm EST, 18:00 GMT
    • The presentation will be approximately 45 minutes long followed by Q&A.

    Check out the details here!

    Todd Hoff's picture

    HSCALE - Handling 200 Million Transactions Per Month Using Transparent Partitioning With MySQL Proxy

    Update 2: A HSCALE benchmark finds HSCALE "adds a maximum overhead of about 0.24 ms per query (against a partitioned table)." Future releases promise much improved results.
    Update: A new presentation at An Introduction to HSCALE.

    After writing Skype Plans for PostgreSQL to Scale to 1 Billion Users, which shows how Skype smartly uses a proxy architecture for scaling, I'm now seeing MySQL Proxy articles all over the place. It's like those "get rich quick" books that say all you have to do is visualize a giraffe with a big yellow dot superimposed over it and by sympathetic magic giraffes will suddenly stampede into your life. Without realizing it I must have visualized transparent proxies smothered in yellow dots.

    One of the brightest images is a wonderful series of articles by Peter Romianowski describing the evolution of their proxy architecture. Their application is an OLTP system executing 200 million transaction per month, tables with more than 1.5 billion rows, and a 600 GB total database size. They ran into a wall buying bigger boxes and wanted to move to a sharded architecture. The question for them was: how do you implement sharding?

    Todd Hoff's picture

    Friends for Sale Architecture - A 300 Million Page View/Month Facebook RoR App

    Update: Jake in Does Django really scale better than Rails? thinks apps like FFS shouldn't need so much hardware to scale.

    In a short three months Friends for Sale (think Hot-or-Not with a market economy) grew to become a top 10 Facebook application handling 200 gorgeous requests per second and a stunning 300 million page views a month. They did all this using Ruby on Rails, two part time developers, a cluster of a dozen machines, and a fairly standard architecture. How did Friends for Sale scale to sell all those beautiful people? And how much do you think your friends are worth on the open market?

    Marcelb's picture

    Rather small site arhitecture.

    Website stats:

    Webserver: Apache 2.2
    Database: MySQL 5.0
    APC cache for php
    CMS: Drupal 6.2 (bleeding-edge version)*
    *Aggressive caching is ON, Page Compression ON, Block Cache ON (can't use CCS),Optimize CSS/JS ON.
    2 Servers: Apache/Mysql (low-tech servers - Celeron processors, 512 MB RAM, 7200 RPM HDD)
    Bandwidth 10 Mb/s

    The benchmark:

    Used ab : ab -n 1000 -c 20 howwhatwho.com

    Server Software: Apache/2.2.3
    Server Hostname: howwhatwho.com
    Server Port: 80
    Document Path: /
    Document Length: 41639 bytes
    Concurrency Level: 20
    Time taken for tests: 13.556796 seconds
    Complete requests: 1000
    Failed requests: 0
    Write errors: 0
    Total transferred: 42118000 bytes
    HTML transferred: 41639000 bytes
    Requests per second: 73.76 [#/sec] (mean)
    Time per request: 271.136 [ms] (mean)
    Time per request: 13.557 [ms] (mean, across all concurrent requests)
    Transfer rate: 3033.90 [Kbytes/sec] received

    The Apache server is also running the postifx and bind although they aren't resource intensive applications.
    The Cron job for drupal runs every 50 minutes, and the agreggator module is enabled and fetches more than 30 rss feeds every time.

    The site used to be hosted on a single Celeron machine but on peak times the CPU went up to 80 %.

    Question : Does anybody know a website hosted on an IBM Mainframe? :) Todd?

    Todd Hoff's picture

    Scaling Mania at MySQL Conference 2008

    The 2008 MySQL Conference & Expo has now closed, but what is still open for viewing is all the MySQL scaling knowledge that was shared. Planet MySQL is a great source of the goings on:

  • Scaling out MySQL: Hardware today and tomorrow by Jeremy Cole and Eric Bergen of Proven Scaling. In it are answered all the big questions of life: What about 64-bit? How many cores? How much memory? Shared storage? Finally we learn the secrets of true happiness.
  • Panel Video: Scaling MySQL? Up or Out?. Don't have time? Take a look at the Diamond Note excellent game day summary. Companies like MySQL, Sun, Flickr, Fotolog, Wikipedia, Facebook and YouTube share intel on how many web servers they have, how they handle failure, and how they scale.
  • Kevin Burton in Scaling MySQL and Java in High Write Throughput Environments - How we built Spinn3r shows how they crawl and index 500k posts per hour using MySQL and 40 servers.
  • Venu Anuganti channels Dathan Pattishall's talk on scaling heavy concurrent writes in real time.
  • This time Venu channels Helping InnoDB scale on servers with many cores by Mark Callaghan from Google.
  • Exploring Amazon EC2 for Scale-out Applications by Morgan Tocker, MySQL Canada, Carl Mercier, Defensio. RoR based spam filtering services that runs completely on EC2. Show evolution from a simple configuration to a sharded architecture.
  • Applied Partitioning and Scaling Your (OLTP) Database System by Phil Hilderbrand.
  • Real World Web: Performance & Scalability by Ask Bjorn Hansen. (189 slides!). He promises you haven't seen this talk before. The secret: Think Horizontal.
  • Too many to list here. All the presentations are available on scribd.

  • Scalr - Open Source Auto-scaling Hosting on Amazon EC2

    Scalr is a fully redundant, self-curing and self-scaling hosting environment utilizing Amazon's EC2. It has been recently open sourced on Google Code.

    Scalr allows you to create server farms through a web-based interface using prebuilt AMI's for load balancers (pound or nginx), app servers (apache, others), databases (mysql master-slave, others), and a generic AMI to build on top of.
    Scalr promises automatic high-availability and scaling for developers by health and load monitoring.

    The health of the farm is continuously monitored and maintained. When the Load Average on a type of node goes above a configurable threshold a new node is inserted into the farm to spread the load and the cluster is reconfigured. When a node crashes a new machine of that type is inserted into the farm to replace it.

    Scaling Out MySQL

    This post covers two main options for scaling-out MySql and compare between them. The first is based on data-base clustering and the second is based on In Memory clustering a.k.a Data Grid. A special emphasis is given to a pattern which shows how to scale our existing data base without changing it through a combination of Data Grid and data base as a background service. This pattern is referred to as Persistency as a Service (PaaS). It also address many of the fequently asked question related to how performance, reliability and scalability is achieved with this pattern.

    Todd Hoff's picture

    YouTube Architecture

    Update: YouTube: The Platform. YouTube adds a new rich set of APIs in order to become your video platform leader--all for free. Upload, edit, watch, search, and comment on video from your own site without visiting YouTube. Compose your site internally from APIs because you'll need to expose them later anyway.

    YouTube grew incredibly fast, to over 100 million video views per day, with only a handful of people responsible for scaling the site. How did they manage to deliver all that video to all those users? And how have they evolved since being acquired by Google?

    Sun to Acquire MySQL

    So what are we announcing today? That in addition to acquiring MySQL, Sun will be unveiling new global support offerings into the MySQL marketplace. We'll be investing in both the community, and the marketplace - to accelerate the industry's phase change away from proprietary technology to the new world of open web platforms.

    Read more on Jonathan Schwartz's Blog

    What do you think about this?

    Todd Hoff's picture

    How Ruby on Rails Survived a 550k Pageview Digging

    Shanti Braford details how his Ruby on Rails based website survived a 24 hour 550,000+ pageview digg attack. His post cleanly lays out all the juicy setup details, so there's not much I can add.

    Hosting costs $370 a month for 1 web server, 1 database server, and sufficient bandwidth. The site is built on RoR, nginx, MySQL, and 7 mongrel servers. He thinks Rails 2.0 has improved performance and credits database avoidance and fragment caching for much of the performance boost.

    Keep in mind his system is relatively static, but it's a very interesting and useful experience report.