Michael Nygard talks about Two Ways To Boost Your Flagging Web Site. The idea behind cache farms is to move memory devoted to the various caching layers into one large farm of caches, as with memcached. The idea behind read pools is to allocate your database read requests to a pool of dedicated read servers, thus offloading the write server. Using a combination of the strategies you aren't forced to scale up the database tier to scale your website.
Slashdot effect: overwhelming unprepared sites with an avalanche of reader's clicks after being mentioned on Slashdot. Sure, we now have the "Digg effect" and other hot new stars, but Slashdot was the original. And like many stars from generations past, Slashdot plays the elder statesman's role with with class, dignity, and restraint. Yet with millions and millions of users Slashdot is still box office gold and more than keeps up with the young'ins. And with age comes the wisdom of learning how to handle all those users. Just how does Slashdot scale and what can you learn by going old school? Site: http://slashdot.org
The Hardware Architecture
The Software Architecture
Paper: Container-based Operating System Virtualization: A Scalable, High-performance Alternative to Hypervisors
One stumbling block of the the great march towards virtualization is the relatively poor performance of resource hungry applications like databases. We are told to develop and test using VMs, but deploy without them. Which kind of sucks IMHO. Maybe better virtualization technology can remove this split. This paper talks about a different approach to virtualization called "container-based" virtualization that can reportedly double the performance of traditional hypervisor systems like Xen. It does this by trading isolation for efficiency. Rather than maintaining complete isolation between VMs the container approach shares resources between VMs and thus gives higher performance while still guaranteeing strong fault, resource, and security isolation. It's yet another battle in computing's endless war of creating and destroying abstraction layers. I learned a lot from from this paper because of how it compared and contrasted traditional hypervisor and container based virtualization strategies. Good job.
Hi, I would like feed back on a ID generator I just made. What positive and negative effects do you see with this. It's programmed in Java, but could just as easily be programmed in any other typical language. It's thread safe and does not use any synchronization. When testing it on my laptop, I was able to generate 10 million IDs within about 15 seconds, so it should be more than fast enough. Take a look at the attachment.. (had to rename it from IdGen.java to IdGen.txt to attach it) IdGen.java
the authors of drupal have paid considerable attention to performance and scalability. consequently even a default install running on modest hardware can easily handle the demands a small website. if you are lucky, eventually the time comes when you need to service more users than your system can handle. at some point, you'll start looking at your hardware and network deployment.
Hi all, a I run a site that after a complete redesign have gotten a lot more traffic. The site provides free flash games, so the biggest traffic share goes to serving flash files (from about 100K and up to several megabytes in size each.) I currently host the entire site on a hosting provider that have no traffic limits. But since they are very cheap (yet have served me very well all the time with at least 99,9% uptime), I don't trust them in allowing me to continue consuming more and more bandwidth. I just guess I'm going to reach some internal limit they have on day, so I'm looking into moving all the flash content over to a content delivery network of some sort. Some recent traffic stats: August: 12 GB September: 22 GB October: 55 GB November: Currently 2,3 GB pr day on average, but it's rising.. I've been looking into Amazon S3, but have not decided on anything yet. So therefor I'm asking if there are any other provides I should consider, that operates within the same price range as Amazon does (or lower)? Best regards, Christian Felde
If you are trying to create highly available file systems, especially across data centers, then ChironFS is one potential solution. It's relatively new, so there aren't lots of experience reports, but it looks worth considering. What is ChironFS and how does it work? Adapted from the ChironFS website: The Chiron Filesystem is a Fuse based filesystem that frees you from single points of failure. It's main purpose is to guarantee filesystem availability using replication. But it isn't a RAID implementation. RAID replicates DEVICES not FILESYSTEMS. Why not just use RAID over some network block device? Because it is a block device and if one server mounts that device in RW mode, no other server will be able to mount it in RW mode. Any real network may have many servers and offer a variety of services. Keeping everything running can become a real nightmare!
Im sure most are familiar with Facebooks 'news feed'. If not, the 'news feed' basically lists recent activity of all of your friends. I dont see how you can get this information efficiently from a DB: * Im assuming all user activity is inserted in a "actions" table. * first get a list of all your friends * then query the actions table to return recent activity where the activity belongs to someone on your friends list This can't be efficient especially considering some people have 200+ friends. So what am I missing? How do you think Facebook is implementing their "news feed". Im not asking for any specific details, just a general point in the right direction, as I cant see how they are implementing the 'news feed efficiently. Thanks.
All the cool kids advocate scaling out as the secret sauce of scaling. And it is, but don't forget to serve some tasty "scaling up" as a side dish. Scaling up doesn't have to mean buying a jet propelled, liquid cooled, 128 core monster super computer. Scaling up can just mean buying at the high end of the commodity buffet by buying more cores, more memory and using a shared nothing architecture to take advantage of all that power without adding complexity. Scale out when you need to, but big beefy boxes can absorb a lot of load before it's necessary to hit up your data center for more rack space. Here are a few examples of scaling out and up:
We’re seeing machines with eight cores and 32G of memory. If we were to buy eight disks for these boxes it’s really like buying 8 machines with 4G each and one disk. This partially goes into the horizontal vs vertical scale discussion. Is it better to buy one $10k box or 10 $1k boxes? I think it’s neither. Buy 4 $2.5k boxes. The new multicore stuff is super cheap.
Scaling out doesn’t mean using crappy hardware. I think people take the “scale out” model (that they’ve often only read about from outdated conference presentations) to quite an extreme. They think scaling out means using desktop-class, bad hardware, and just buying a ton of them. That model doesn’t work, and it’s hell to maintain in the long term. Use commodity hardware. You often hear the term “commodity hardware” in reference to scale out. While crappy hardware is also commodity, what this means is that instead of getting stuck on the low-end $40k machine, with thoughts of upgrading to the $250k machine, and maybe later the $1M machine, you use data partitioning and any number of let’s say $5k machines. That doesn’t mean a $1k single-disk crappy machine as said above. What does it mean for the machine to be “commodity”? It means that the components are standardized, common, and the price is set by the market, not by a single corporation. Use commodity machines configured with a good balance of price vs. performance.