Scalability forces us to think differently. What worked on a small scale doesn't always work on a large scale -- and costs are no different. If 90% of our application is free of contention, and only 10% is spent on a shared resources, we will need to grow our compute resources by a factor of 100 to scale by a factor of 10! Another important thing to note is that 10x, in this case, is the limit of our ability to scale, even if more resources are added.
1. The cost of non-linearly scalable applications grows exponentially with the demand for more scale.
2. Non-linearly scalable applications have an absolute limit of scalability. According to Amdhal's Law, with 10% contention, the maximum scaling limit is 10. With 40% contention, our maximum scaling limit is 2.5 - no matter how many hardware resources we will throw at the problem.
This post discuss in further details how to measure the true cost of non linearly scalable systems and suggest a model for reducing that cost significantly.
This post covers two main options for scaling-out MySql and compare between them. The first is based on data-base clustering and the second is based on In Memory clustering a.k.a Data Grid. A special emphasis is given to a pattern which shows how to scale our existing data base without changing it through a combination of Data Grid and data base as a background service. This pattern is referred to as Persistency as a Service (PaaS). It also address many of the fequently asked question related to how performance, reliability and scalability is achieved with this pattern.
[Tim O'Reilly] Continuing my series of queries about how "Web 2.0" companies used databases, I asked Cal Henderson of Flickr to tell me "how the folksonomy model intersects with the traditional database. How do you manage a tag cloud?"
One of the most interesting new features of Oracle 11 is the new function result caching mechanism. Until now, making sure that a PL/SQL function gets executed only as many times as necessary was a black art. The new caching system makes that quite easy -- here is how it works.
Effective content caching is one of the key features of scalable web sites. Although there are several out-of-the-box options for caching with modern web technologies, a custom built cache still provides the best performance.
A very entertaining and somewhat educational article on IBM Poopheads say LAMP Users Need to "grow up". The physical three tier architecture turns out to be the root of all evil and shared nothing architectures brings simplicity and light.
In the comments Simon Willison makes an insightful comment on why fine grained caching works for personalized pages and proxy's don't:
Great post, but I have to disagree with you on the finely grained caching part. If you look at big LAMP deployments such as Flickr, LiveJournal and Facebook the common technology component that enables them to scale is memcached - a tool for finely grained caching. That's not to say that they aren't doing shared-nothing, it's just that memcached is critical for helping the database layer scale. LiveJournal serves around 50% of its page views "permission controlled" (friends only) so an HTTP proxy on the front end isn't the right solution - but memcached reduces their database hits by 90%.
Tugela Cache is a cache system like memecached, but instead of storing data just in RAM, it stores data in the file system using a b-tree. You trade latency in order to have a very large cache. It's useful for sites that have caching requirements that exceed their available memory. It uses the same wire protocol as memcached so it can be dropped in without a hassle. From the website:
As large MediaWiki deployments may gain performance using Memcached, at some level cost of RAM to store all objects becomes too high. In order to balance resource usage and make more use of our Apache server disks, Tugela, the distributed cached on-disk hash database, has arrived.
Tugela Cache is derived from Memcached. Much of the code remains the same, but notably, these changes:
* Internal slab allocator replaced by BerkeleyDB B-Tree database.
* Expiry policy management moved to external program tugela-expire
* Much statistics code made obsolete.
An interesting point brought up in the comments is using memcached with a larger cache size than physical RAM and then let the OS swap versus using a b-tree to access data on disk. Nginx seems to use the "let the OS swap" approach to good effect. It would be interesting to see which approach works better.
For an idea of how an in-process cache and a disk based cache hierarchy can work together take a look at Kevin Burton's IDEA: Hierarchy of caches for high performance AND high capacity memcached.
There's also an interesting variation called Memcachedb which is said to be "a better and simplified Tugela." It's more of a persistence mechanism than a cache. It enables transactions, replication, and there's no expiration.
This scalability strategy is brought to you by Erik Osterman:
My recommendations for anyone dealing with explosive growth on a limited budget with lots of cachable content (e.g. content capable of returning valid expiration headers) is employ a reverse proxy as mentioned in this article.
In the last week, we had a site get AP'd, triggering 100K unique visitors to a single IIS server in under 5 hours. It took out the IIS server. Placing a single squid infront of the server handled the entire onslaught with a max server load of 0.10 on a modest Intel IV 3Ghz.
It's trivial to implement for anyone interested...
memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.
Danga Interactive developed memcached to enhance the speed of LiveJournal.com, a site which was already doing 20 million+ dynamic page views per day for 1 million users with a bunch of webservers and a bunch of database servers. memcached dropped the database load to almost nothing, yielding faster page load times for users, better resource utilization, and faster access to the databases on a memcache miss.
eAccelerator is a free open-source PHP accelerator, optimizer, and dynamic content cache. It increases the performance of PHP scripts by caching them in their compiled state, so that the overhead of compiling is almost completely eliminated. It also optimizes scripts to speed up their execution. eAccelerator typically reduces server load and increases the speed of your PHP code by 1-10 times.
Recent comments
16 min 39 sec ago
18 min 27 sec ago
20 min 12 sec ago
21 min 30 sec ago
22 min 31 sec ago
24 min 27 sec ago
25 min 43 sec ago
26 min 39 sec ago
27 min 40 sec ago
29 min 18 sec ago