Looks interesting... Abstract: Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance. In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today’s higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.
G'day, I noticed the default sort order for the forum is to show the posts with the most replies first. That seems a bit odd for a forum. Would it not make sense to show the posts with the most recently replies first? It is possible to re-sort the forum threads that way by clicking on the "Last post" header (twice). It would seem like a more sensible default. I've checked and I see the same behaviour as both a registered (logged in) and anonymous user. Cheers - Callum.
G'day, I'm building an application to manage WordPress PHP code on many servers. Our application will push down code updates to each server, as well as performing backups and testing. I'm considering different methods of pushing updated code onto the individual servers. I'm considering something like Capistrano (I've no experience in Ruby though). I've also considered using subversion and then remotely calling svn commands via SSH. Are there any other tools specifically for this purpose? The servers will have persistent data (the WordPress databases) so I don't want to re-image them every update. Plus, they will each have a different set of plugins / themes, so building many images would be too complex. If there are any papers on code deployment, or other recommended reading, please point the links my way. Likewise, if anyone has any suggestions, or would like more details, just let me know. Cheers - Callum.
How do you design a reliable distributed file system when the expected availability of the individual nodes are only ~1/5? That is the case for P2P systems. Dominik Grolimund, the founder of a Swiss startup Caleido will show you how! They have launched Wuala, the social online storage service which scales as new nodes join the P2P network. The goal of Wua.la is to provide distributed online storage that is:
Hello! My first post here, so be patient please. I am developing site where I have lots of static content. But on many pages I have query to update count of views. I would say this is may cause lots of problems and was interested in another solution like storing these counts somewhere else. As my knowledge is bit limited in this way, I am asking you. I can say I understand PHP(OOP ofc) and MySQL. Nowadays I am getting into servers. Other question I have is: I read about making lots of things static.(in Flickr Architecture) and am interested how they do static sites? Lets say they make photo page static? And rebuild when tagg or comment is added? I am bit interested in it as I want to learn Smarty better(newbie) and serving content. Moreover, how about PHP? I have read many books about PHP theoretically but would love to see some RL example of using objects and exceptions(mainly this as I don't completely understand it) to learn some good programming habits. So if you can help me with some example or resource, please do :) I know I've covered huge area of things but these are what makes me mad everyday. So please be patient :) Greetings.
Update 2: Michael Galpin in Cache Money and Cache Discussions likes memcached for it's expiry policy, complex graph data, process data, but says MySQL has many advantages: SQL, Uniform Data Access, Write-through, Read-through, Replication, Management, Cold starts, LRU eviction. Update: Dormando asks Should you use memcached? Should you just shard mysql more?. The idea of caching is the most important part of caching as it transports you beyond a simple CRUD worldview. Plan for caching and sharding by properly abstracting data access methods. Brace for change. Be ready to shard, be ready to cache. React and change to what you push out which is actually popular, vs over planning and wasting valuable time. Feedster's François Schiettecatte wonders if Fotolog's 21 memcached servers wouldn't be better used to further shard data by adding more MySQL servers? He mentions Feedster was able to drop memcached once they partitioned their data across more servers. The algorithm: partition until all data resides in memory and then you may not need an additional memcached layer. Parvesh Garg goes a step further and asks why people think they should be using MySQL at all?
Pre-generating static files is an oldy but a goody, and as Thomas Brox Røst says, it's probably an underused strategy today. At one time this was the dominate technique for structuring a web site. Then the age of dynamic web sites arrived and we spent all our time worrying how to make the database faster and add more caching to recover the speed we had lost in the transition from static to dynamic. Static files have the advantage of being very fast to serve. Read from disk and display. Simple and fast. Especially when caching proxies are used. The issue is how do you bulk generate the initial files, how do you serve the files, and how do you keep the changed files up to date? This is the process Thomas covers in his excellent article Serving static files with Django and AWS - going fast on a budget", where he explains how he converted 600K thousand previously dynamic pages to static pages for his site Eventseer.net, a service for tracking academic events. Eventseer.net was experiencing performance problems as search engines crawled their 600K dynamic pages. As a solution you could imagine scaling up, adding more servers, adding sharding, etc etc, all somewhat complicated approaches. Their solution was to convert the dynamic pages to static pages in order to keep search engines from killing the site. As an added bonus non logged-in users experienced a much faster site and were more likely to sign up for the service. The article does a good job explaining what they did, so I won't regurgitate it all here, but I will cover the highlights and comment on some additional potential features and alternate implementations... They estimated it would take 7 days on single server to generate the initial 600K pages. Ouch. So what they did was use EC2 for what it's good for, spin up a lot of boxes to process data. Their data is backed up on S3 so the EC2 instances could read the data from S3, generate the static pages, and write them to their deployment area. It took 5 hours, 25 EC2 instances, and a meager $12.50 to perform the initial bulk conversion. Pretty slick. The next trick is figuring out how to regenerate static pages when changes occur. When a new event is added to their system hundreds of pages could be impacted, which would require the effected static pages to be regenerated. Since it's not important to update pages immediately they queued updates for processing later. An excellent technique. A local queue of changes was maintained and replicated to an AWS SQS queue. The local queue is used in case SQS is down. Twice a day EC2 instances are started to regenerate pages. Instances read twork requests from SQS, access data from S3, regenerate the pages, and shutdown when the SQS is empty. In addition they use AWS for all their background processing jobs.
CommentsI like their approach a lot. It's a very pragmatic solution and rock solid in operation. For very little money they offloaded the database by moving work to AWS. If they grow to millions of users (knock on wood) nothing much will have to change in their architecture. The same process will still work and it still not cost very much. Far better than trying to add machines locally to handle the load or moving to a more complicated architecture. Using the backups on S3 as a source for the pages rather than hitting the database is inspired. Your data is backed up and the database is protected. Nice. Using batched asynchronous work queues rather than synchronously loading the web servers and the database for each change is a good strategy too. As I was reading I originally thought you could optimize the system so that a page only needed to be generated once. Maybe by analyzing the events or some other magic. Then it hit me that this was old style thinking. Don't be fancy. Just keep regenerating each page as needed. If a page is regenerated a 1000 times versus only once, who cares? There's plenty of cheap CPU available. The local queue of changes still bothers me a little because it adds a complication into the system. The local queue and the AWS SQS queue must be kept synced. I understand that missing a change would be a disaster because the dependent pages would never be regenerated and nobody would ever know. The page would only be regenerated the next time an event happened to impact the page. If pages are regenerated frequently this isn't a serious problem, but for seldom touched pages they may never be regenerated again. Personally I would drop the local queue. SQS goes down infrequently. When it does go down I would record that fact and regenerate all the pages when SQS comes back up. This is a simpler and more robust architecture, assuming SQS is mostly reliable. Another feature I have implemented in similar situations is to setup a rolling page regeneration schedule where a subset of pages are periodically regenerated, even if no event was detected that would cause a page to be regenerated. This protects against any event drops that may cause data be undetectably stale. Over a few days, weeks, or whatever, every page is regenerated. It's a relatively cheap way to make a robust system resilient to failures.
Update: Evaluating Terracotta by Piotr Woloszyn. Nice writeup that covers resilience, failover, DB persistence, Distributed caching implementation, OS/Platform restrictions, Ease of implementation, Hardware requirements, Performance, Support package, Code stability, partitioning, Transactional, Replication and consistency. Terracotta is Network Attached Memory (NAM) for Java VMs. It provides up to a terabyte of virtual heap for Java applications that spans hundreds of connected JVMs. NAM is best suited for storing what they call scratch data. Scratch data is defined as object oriented data that is critical to the execution of a series of Java operations inside the JVM, but may not be critical once a business transaction is complete. The Terracotta Architecture has three components:
- Client Nodes - Each client node corresponds to a client node in the cluster which runs on a standard JVM
- Server Cluster - java process that provides the clustering intelligence. The current Terracotta implementation operates in an Active/Passive mode
- Storage used as
- Virtual Heap storage - as objects are paged out of the client nodes, into the server, if the server heap fills up, objects are paged onto disk
- Lock Arbiter - To ensure that there is no possibility of the classic "split-brain" problem, Terracotta relies on the disk infrastructure to provide a lock.
- Shared Storage - to transmit the object state from the active to passive, objects are persisted to disk, which then shares the state to the passive server(s).
One of the most popular and effective scalability strategies is to impose limits (GAE Quotas, Fotolog, Facebook) as a means of protecting a website against service destroying traffic spikes. Twitter will reportedly limit the number followers to 2,000 in order to thwart follow spam. This may also allow Twitter to make some bank by going freemium and charging for adding more followers. Agree or disagree with Twitter's strategies, the more interesting aspect for me is how do you introduce new policies into an already established ecosystem? One approach is the big bang. Introduce all changes at once and let everyone adjust. If users don't like it they can move on. The hope is, however, most users won't be impacted by the changes and that those who are will understand it's all for the greater good of their beloved service. Casualties are assumed, but the damage will probably be minor. Now in Twitter's case the people with the most followers tend to be opinion leaders who shape much of the blognet echo chamber. Pissing these people off may not be your best bet. What to do? Shegeeks.net makes a great proposal: Limit The New, Not The Old. The idea is to only impose the limits on new accounts, not the old. Old people are happy and new people understand what they are getting into. The reason I like this suggestion so much is that it has deep historical roots, all the way back to the fall of the Roman republic and the rise of the empire due to the agrarian reforms laws passed in 133BC. In ancient Rome property and power, as they tend to do, became concentrated in the hands of a few wealthy land owners. Let's call them the nobility. The greatness that was Rome was founded on a agrarian society. People made modest livings on small farms. As power concentrated small farmers were kicked of the land and forced to move to the city. Slaves worked the land while citizens remained unemployed. And cities were no place to make a life. Civil strife broke out. Pliny said that "it was the large estates which destroyed Italy." To redress the imbalance and reestablish traditional virtuous Roman character, "In 133 BC, Tiberius Sempronius Gracchus, the plebeian tribune, passed a series of laws attempting to reform the agrarian land laws; the laws limited the amount of public land one person could control, reclaimed public lands held in excess of this, and attempted to redistribute the land, for a small rent, to farmers now living in the cities." At this time Rome was a republic. It had a mixed constitution with two consuls at the top, a senate, and tribunes below. Tribunes represented the people. The agrarian laws passed by Gracchus set off a war between between the nobility and the people. The nobles obviously didn't want to lose their the land and set in motion a struggle eventually leading to the defeat of the people, the fall of the Republic, and the rise of the Empire. Machiavelli, despite his unpopular press, was a big fan of the republican form of government, and pinpointed the retroactive and punitive nature of the agrarian laws as why the Republic fell. He didn't say reform wasn't needed, but he thought the mistake was in how in how the reform was carried out. By making the laws retroactive it challenged the existing order. And by making the laws so punitive it made the nobility willing to fight to keep what they already had. This divided the people and caused a class war. In fact, this is the origin in the ex post facto clause in the US Constitution, which says you can't pass a law that with retroactive effect. Machiavelli suggested the reforms should have been carried out so they only impacted the future. For example, in new conquered lands small farmers would be given more land. In this way the nobility wouldn't be directly challenged, reform could happen slowly over time, and the republic would have been preserved. Cool parallel, isn't it? Hey, who says there's nothing to learn from history! Let's just hope history doesn't rerepeat itself again.
A couple of videos about distributed computing with direct reference on Google infrastructure. You will get acquainted with: --MapReduce the software framework implemented by Google to support parallel computations over large (greater than 100 terabyte) data sets on commodity hardware --GFS and the way it stores it's data into 64mb chunks --Bigtable which is the simple implementation of a non-relational database at Google Cluster Computing and MapReduce Lectures 1-5.