Update: MapReduce and PageRank Notes from Remzi Arpaci-Dusseau's Fall 2008 class . Collects interesting facts about MapReduce and PageRank. For example, the history of the solution to searching for the term "flu" is traced through multiple generations of technology. With Google entering the cloud space with Google AppEngine and a maturing Hadoop product, the MapReduce scaling approach might finally become a standard programmer practice. This is the best paper on the subject and is an excellent primer on a content-addressable memory future. Some interesting stats from the paper: Google executes 100k MapReduce jobs each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory. One common criticism ex-Googlers have is that it takes months to get up and be productive in the Google environment. Hopefully a way will be found to lower the learning curve and make programmers more productive faster. From the abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google’s clusters every day, processing a total of more than twenty petabytes of data per day. Thanks to Kevin Burton for linking to the complete article.
In article Building Super-Scalable Web Systems with REST Udi Dahan tells an interesting story of how they made a weather reporting system scale for over 10 million users. So many users hitting their weather database didn't scale. Caching in a straightforward way wouldn't work because weather is obviously local. Caching all local reports would bring the entire database into memory, which would work for some companies, but wasn't cost efficient for them. So in typical REST fashion they turned locations into URIs. For example: http://weather.myclient.com/UK/London. This allows the weather information to be cached by intermediaries instead of hitting their servers. Hopefully for each location their servers will be hit a few times and then the caches will be hit until expiry. In order to send users directly to the correct location an IP location check is performed on login and stored in a cookie. The lookup is done once and from then on out a GET is performed directly on the resource. There's no need to hit their servers and do a lookup on the user to get the location. That's all bypassed. I like Udi's summary of the approach and is why I think this is a good strategy : This isn’t a “cheap trick”. While being straight forward for something like weather, understanding the nature of your data and intelligently mapping that to a URI space is critical to building a scalable system, and reaping the benefits of REST.
Scalability Perspectives is a series of posts that highlights the ideas that will shape the next decade of IT architecture. Each post is dedicated to a thought leader of the information age and his vision of the future. Be warned though – the journey into the minds and perspectives of these people requires an open mind.
Werner VogelsDr. Werner Vogels is Vice President & Chief Technology Officer at Amazon.com where he is responsible for driving the company’s technology vision, which is to continuously enhance the innovation on behalf of Amazon’s customers at a global scale. Prior to joining Amazon, he worked as a researcher at Cornell University where he was a principal investigator in several research projects that target the scalability and robustness of mission-critical enterprise computing systems. He is regarded as one of the world's top experts on ultra-scalable systems and he uses his weblog to educate the community about issues such as eventual consistency. Information Week recently recognized Vogels for this educational and promotional role in Cloud Computing with the 2008 CIO/CTO of the Year award.
Service-Oriented Architecture, Utility Computing and Internet Level 3 Platform in practiceAmazon has built a loosely coupled service-oriented architecture on an inter-planetary scale. They are the pioneers of Utility Computing and Internet Platforms discussed earlier in Scalability Perspectives. Amazon's CTO, Werner Vogels is undoubtedly a thought leader for the coming age of cloud computing.
Cloud Computing CTO or Chief Cloud Officer?Vogels' name and face are often associated with Amazon's cloud, but Amazon Web Services isn't a one-man show, it is Teamwork. Amazon's CTO has emerged as the right person at the right time and place to guide cloud computing - until now, an emerging technology for early adopters - into the mainstream. He not only understands how to architect a global computing cloud consisting of tens of thousands of servers, but also how to engage CTOs, CIOs, and other professionals at customer companies in a discussion of how that architecture could potentially change the way they approach IT. If all goes as planned, Amazon's cloud will serve as an extension of corporate data centers for new applications and overflow capacity, so-called cloud bursting. Over time, Amazon will then take on more and more of the IT workload from businesses that see value in the model. Customer-centric? What Amazon's doing goes beyond that. Amazon's cloud becomes their cloud; its CTO, their CTO. As an expert of distributed systems Vogels shares interesting insights on scalability related issues on his blog such as:
The Amazon Technology PlatformWerner Vogels explains how Amazon has become a platform provider, and how an increasing number of diverse businesses are built on the Amazon.com platform in this QCon presentation. The most important thing to understand is that Amazon is a Technology Platform with the emphasis on Technology. The scalable and reliable platform is the main enabler of Amazon's business model. Dr Werner describes Amazon’s platform business model and its ‘flywheel’ for growth on the latest episode of the Telco 2.0 ‘executive brainstorm’ series on Telecom TV. Amazon has many platforms that fuels growth such as:
- Amazon Merchants
- Amazon Associates
- Amazon E-Commerce Platform
- Web Scale Computing Platform
- Amazon Kindle
- Telecommunications platfrom using Amazon's platform?
- All Things Distributed - Werner Vogels' weblog on building scalable and robust distributed systems
- Werner Vogels on Wikipedia.org
- Information Week - Chief Of The Year: Amazon CTO Werner Vogels and related Q&A
- The Amazon.com Technology Platform: Building Blocks for Innovation
- Amazon's Platform Business Model
- Episode 40: Interview Werner Vogels
- ACM Queue - A conversation with Werner Vogels
- ITConversations: Supernova – Werner Vogels on Scalability
- Sell on Amazon: A Guide to Amazon's Marketplace, Seller Central, and Fulfillment by Amazon Programs
- Werner Vogels on Twitter
Simone Brunozzi, technology evangelist for Amazon Web Services in Europe, describes how Soocial.com was fully ported to Amazon web services.
---------------- This period of the year I decided to dedicate some time to better understand how our customers use AWS, therefore I spent some online time with Stefan Fountain and the nice guys at Soocial.com, a "one address book solution to contact management", and I would like to share with you some details of their IT infrastructure, which now runs 100% on Amazon Web Services!
In the last few months, they've been working hard to cope with tens of thousands of users and to get ready to easily scale to millions. To make this possible, they decided to move ALL their architecture to Amazon Web Services. Despite the fact that they were quite happy with their previous hosting provider, Amazon proved to be the way to go. -----------------
Read the rest of the article here.
In this article they present the companies which offers means (mainly, the software and hardware) which powers most of the cloud computing hosting providers, namely virtualization solutions.
Read the entire article about Platform virtualization - top 25 providers (software, hardware, combined) at MyTestBox.com - web software reviews, news, tips & tricks.
Under the philosophy that the best method to analyse spam is to become a spammer, this absolutely fascinating paper recounts how a team of UC Berkely researchers went under cover to infiltrate a spam network. Part CSI, part Mission Impossible, and part MacGyver, the team hijacked the botnet so that their code was actually part of the dark network itself. Once inside they figured out the architecture and protocols of the botnet and how many sales they were able to tally. Truly elegant work.
Two different spam campaigns were run on a Storm botnet network of 75,800 zombie computers. Storm is a peer-to-peer botnet that uses spam to creep its tentacles through the world wide computer network. One of the campains distributed viruses in order to recruit new bots into the network. This is normally accomplished by enticing people to download email attachments. An astonishing one in ten people downloaded the executable and ran it, which means we won't run out of zombies soon. The downloaded components include: Backdoor/downloader, SMTP relay, E-mail address stealer, E-mail virus spreader, Distributed denial of service (DDos) attack tool, pdated copy of Storm Worm dropper. The second campaign sent pharmacuticle spam ("libido boosting herbal remedy”) over the network.
Haven't you always wondered who clicks on spam and how much could spammers possibly make? In the study only 28 sales resulted from 350 million spam e-mail messages sent over 26 days. A conversion rate of well under 0.00001% (typical advertising campaign might have a conversion of 2-3%). The average purchase price was about $100 for $2,731.88 in total revenue. The reserchers estimate total daily revenue attributable to Storm’s pharmacy campaign is about $7000 and that they pick up between 3500 and 8500 new bots per day through their Trojan distribution system. And this is with only 1.5% of the entire network in use.
So, the spammers would take in total revenue about $3.5 million a year from one product from one network. Imagine the take with multiple products and multiple networks? That's why we still have spam. And since the conversion rate is already so low, it seems spam will always be with us.
As fascinating as all the spamonomics are, the explanation of the botnet architecture is just as fascinating. Storm uses a three-level self-organizing hierarchy pictured here:
A host selects its worker or proxy role automatically. If a firewall doesn't prevent inbound communication the infected host becomes a proxy, otherwise the host becomes a worker. As workers pull work from proxies there's no need to contact one directly. Proxies on the other hand are directly contacted by master servers so communication must be bidirectional.
Storm communicates using two separate protocols:
According to Brandon Enright: When a peer wants to find content in the network, it computes (or is given) the hash of that content and then searches adjacent peers. Those peers respond with their adjacent peers that are closer. This is repeated until the searching peer gets close enough to the content that a node there will be able to provide a search result. This is a complicated and interesting process that the Spamalytics paper goes into in a lot more detail on as do some references at the end of this post.
Storm harnesses a large, unreliable, constantly changing distributed system to do work. It's an architecture worth learning from and we'll explore some of those lessons in a later post.
The key (no pun intended) to understanding how to organize your dataset’s data is to think of each shard not as an individual database, but as one large singular database. Just as in a normal single server database setup where you have a unique key for each row within a table, each row key within each individual shard must be unique to the whole dataset partitioned across all shards. There are a few different ways we can accomplish uniqueness of row keys across a shard cluster. Each has its pro’s and con’s and the one chosen should be specific to the problems you’re trying to solve.
This may be a bit higher level then the general discussion here, but I think this is an important issue in how it relates to reliability and uptime. What kind of SLAs should we be expecting from SaaS services and platforms (e.g. AWS, Google App Engine, Google Premium Apps, salesforce.com, etc.)? Up to today, most SaaS services either have no SLAs or offer very weak penalties. What will it take to get these services up to the point where they can offer the SLAs that users (and more importantly, businesses) require? I presume most of the members here want to see more movement into the cloud and to SaaS services, and I'm thinking that until we see more substantial SLA guarantees, most businesses will continue to shy away as long as they can. Would love to hear what others think. Or am I totally off base?
Successful software design is all about trade-offs. In the typical (if there is such a thing) distributed system, recognizing the importance of trade-offs within the design of your architecture is integral to the success of your system. Despite this reality, I see time and time again, developers choosing a particular solution based on an ill-placed belief in their solution as a “silver bullet”, or a solution that conquers all, despite the inevitable occurrence of changing requirements. Regardless of the reasons behind this phenomenon, I’d like to outline a few of the methods I use to ensure that I’m making good scalable decisions without losing sight of the trade-offs that accompany them. I’d also like to compile (pun intended) the issues at hand, by formulating a simple theorem that we can use to describe this oft occurring situation.
Update:Presentation: Second Life’s Architecture. Ian Wilkes, VP of Systems Engineering, describes the architecture used by the popular game named Second Life. Ian presents how the architecture was at its debut and how it evolved over years as users and features have been added. Second Life is a 3-D virtual world created by its Residents. Virtual Worlds are expected to be more and more popular on the internet so their architecture might be of interest. Especially important is the appearance of open virtual worlds or metaverses. What happens when video games meet Web 2.0? What happens is the metaverse.
- Second Life runs MySQL
- Interview with Ian Wilkes
- TechTrends: Inside Linden Lab
- Town Hall with Cory Linden
- InformationWeek articles (1, 2) and blog
- Second Life Wiki: Server Architecture
- Wikipedia: Second Life Server
- Second Life Blog
- Second Life: A Guide to Your Virtual World
- ~1M active users
- ~95M user hours per quarter
- ~70K peak concurrent users (40% annual growth)
- ~12Gbit/sec aggregate bandwidth (in 2007)
Staff (in 2006)
- 70 FTE + 20 part time
- Open Source client
- Render the Virtual World
- Handles user interaction
- Handles locations of objects
- Gets velocities and does simple physics to keep track of what is moving where
- No collision detection
- Runs Havok 4 physics engine
- Runs at 45 frames/sec. If it can't keep up, it will attempt time dialation without reducing frame rate.
- Handles storing object state, land parcel state, and terrain height-map state
- Keeps track of where everything is and does collision detection
- Sends locations of stuff to viewer
- Transmits image data in a prioritized queue
- Sends updates to viewers only when needed (only when collision occurs or other changes in direction, velocity etc.)
- Runs Linden Scripting Language (LSL) scripts
- Scripting has been recently upgraded to the much faster Mono scripting engine
- Handles chat and instant messages
- One big clustered filesystem ~100TB
- Stores asset data such as textures.
- Eventlet is a networking library written in Python. It achieves high scalability by using non-blocking io while at the same time retaining high programmer usability by using coroutines to make the non-blocking io operations appear blocking at the source code level.
- Mulib is a REST web service framework built on top of eventlet
- 2000+ Servers in 2007
- ~6000 Servers in early 2008
- Plans to upgrade to ~10000 (?)
- 4 sims per machine, for both class 4 and class 5
- Used all-AMD for years, but are moving from the Opteron 270 to the Intel Xeon 5148
- The upgrade to "class 5" servers doubled the RAM per machine from 2GB to 4GB and moved to a faster SATA disk
- Class 1 - 4 are on 100Mb with 1Gb uplinks to the core. Class 5 is on pure 1Gb