Here's a short list of some great resources that I've found very inspirational and thought provoking. I've broken these resources up into two lists: Blogs and Presentations.
The upshot of the paper is Oracle rules and MySQL sucks for sharding. Which is technically probable, if you don't throw in minor points like cost and ease of use. The points where they think Oracle wins: online schema changes, more robust replication, higher availability, better corruption handling, better use of large RAM and multiple cores, better and better tested partitioning features, better monitoring, and better gas mileage.
I have two servers connected via Internet (NOT IN THE SAME LAN) serving the same website (http://www.ourexample.com).The problem is files uploaded on serverA and serverB cannot see each other immediately,thus rsync with certain intervals is not a good solution. Can anybody give me some advice on the following options? 1.NFS over Internet for file sharing 2.sshfs 3.inotify(our system's kernel does not support this and we donot want to risk upgrading our kernel as well) 4.drbd in active-active mode 5 or any other solutions Any suggestions will be welcomed. Thank you in advance.
Datacenterknowledge.com: In an effort to boost its refocused cloud computing initiative, Sun Microsystems (JAVA) has acquired Q-layer, a Belgian provider that automates the deployment of both public and private clouds. Sun says Q-layer’s technology will help users instantly provision servers, storage, bandwidth and applications. Do you have experience with Q-layers technology like its Virtual Private DataCenter and NephOS?
It seems that HTTP calls have become a default way to think about distributed systems. HTTP and Web services definitely have a lot to offer, but they are not the only way to do things and there are definitely cases where web is not the right choice. Unfortunately, lots of people just stick with web services and hack on, trying to fit a square peg in a round hole. In cases such as these, a different distribution paradigm can save us quite a lot of time and effort both in development and later in maintenance. One of those different paradigms is messaging.
How do we debug and profile a cloud full of processors and threads? It's a problem more will be seeing as we code big scary programs that run on even bigger scarier clouds. Logging gets you far, but sometimes finding the root cause of problem requires delving deep into a program's execution. I don't know about you, but setting up 200,000+ gdb instances doesn't sound all that appealing. Tools like STAT (Stack Trace Analysis Tool) are being developed to help with this huge task. STAT "gathers and merges stack traces from a parallel application’s processes." So STAT isn't a low level debugger, but it will help you find the needle in a million haystacks. Abstract:
Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application – already, debugging the full BlueGene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks. In this paper, we present challenges to petascale tool development, using the Stack Trace Analysis Tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.
Lessons LearnedAt the end of the paper they identify several insights they had about developing petascale tools:
While working with Memcache the other night, it dawned on me that it’s usage as a distributed caching mechanism was really just one of many ways to use it. That there are in fact many alternative usages that one could find for Memcache if they could just realize what Memcache really is at its core – a simple distributed hash-table – is an important point worthy of further discussion. To be clear, when I say “simple”, by no means am I implying that Memcache’s implementation is simple, just that the ideas behind it are such. Think about that for a minute. What else could we use a simple distributed hash-table for, besides caching? How about using it as an alternative to the traditional shard lookup method we used in our Master Index Lookup scalability strategy, discussed previously here.
Update: MapReduce and PageRank Notes from Remzi Arpaci-Dusseau's Fall 2008 class . Collects interesting facts about MapReduce and PageRank. For example, the history of the solution to searching for the term "flu" is traced through multiple generations of technology. With Google entering the cloud space with Google AppEngine and a maturing Hadoop product, the MapReduce scaling approach might finally become a standard programmer practice. This is the best paper on the subject and is an excellent primer on a content-addressable memory future. Some interesting stats from the paper: Google executes 100k MapReduce jobs each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory. One common criticism ex-Googlers have is that it takes months to get up and be productive in the Google environment. Hopefully a way will be found to lower the learning curve and make programmers more productive faster. From the abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google’s clusters every day, processing a total of more than twenty petabytes of data per day. Thanks to Kevin Burton for linking to the complete article.
In article Building Super-Scalable Web Systems with REST Udi Dahan tells an interesting story of how they made a weather reporting system scale for over 10 million users. So many users hitting their weather database didn't scale. Caching in a straightforward way wouldn't work because weather is obviously local. Caching all local reports would bring the entire database into memory, which would work for some companies, but wasn't cost efficient for them. So in typical REST fashion they turned locations into URIs. For example: http://weather.myclient.com/UK/London. This allows the weather information to be cached by intermediaries instead of hitting their servers. Hopefully for each location their servers will be hit a few times and then the caches will be hit until expiry. In order to send users directly to the correct location an IP location check is performed on login and stored in a cookie. The lookup is done once and from then on out a GET is performed directly on the resource. There's no need to hit their servers and do a lookup on the user to get the location. That's all bypassed. I like Udi's summary of the approach and is why I think this is a good strategy : This isn’t a “cheap trick”. While being straight forward for something like weather, understanding the nature of your data and intelligently mapping that to a URI space is critical to building a scalable system, and reaping the benefits of REST.
Scalability Perspectives is a series of posts that highlights the ideas that will shape the next decade of IT architecture. Each post is dedicated to a thought leader of the information age and his vision of the future. Be warned though – the journey into the minds and perspectives of these people requires an open mind.
Werner VogelsDr. Werner Vogels is Vice President & Chief Technology Officer at Amazon.com where he is responsible for driving the company’s technology vision, which is to continuously enhance the innovation on behalf of Amazon’s customers at a global scale. Prior to joining Amazon, he worked as a researcher at Cornell University where he was a principal investigator in several research projects that target the scalability and robustness of mission-critical enterprise computing systems. He is regarded as one of the world's top experts on ultra-scalable systems and he uses his weblog to educate the community about issues such as eventual consistency. Information Week recently recognized Vogels for this educational and promotional role in Cloud Computing with the 2008 CIO/CTO of the Year award.
Service-Oriented Architecture, Utility Computing and Internet Level 3 Platform in practiceAmazon has built a loosely coupled service-oriented architecture on an inter-planetary scale. They are the pioneers of Utility Computing and Internet Platforms discussed earlier in Scalability Perspectives. Amazon's CTO, Werner Vogels is undoubtedly a thought leader for the coming age of cloud computing.
Cloud Computing CTO or Chief Cloud Officer?Vogels' name and face are often associated with Amazon's cloud, but Amazon Web Services isn't a one-man show, it is Teamwork. Amazon's CTO has emerged as the right person at the right time and place to guide cloud computing - until now, an emerging technology for early adopters - into the mainstream. He not only understands how to architect a global computing cloud consisting of tens of thousands of servers, but also how to engage CTOs, CIOs, and other professionals at customer companies in a discussion of how that architecture could potentially change the way they approach IT. If all goes as planned, Amazon's cloud will serve as an extension of corporate data centers for new applications and overflow capacity, so-called cloud bursting. Over time, Amazon will then take on more and more of the IT workload from businesses that see value in the model. Customer-centric? What Amazon's doing goes beyond that. Amazon's cloud becomes their cloud; its CTO, their CTO. As an expert of distributed systems Vogels shares interesting insights on scalability related issues on his blog such as:
The Amazon Technology PlatformWerner Vogels explains how Amazon has become a platform provider, and how an increasing number of diverse businesses are built on the Amazon.com platform in this QCon presentation. The most important thing to understand is that Amazon is a Technology Platform with the emphasis on Technology. The scalable and reliable platform is the main enabler of Amazon's business model. Dr Werner describes Amazon’s platform business model and its ‘flywheel’ for growth on the latest episode of the Telco 2.0 ‘executive brainstorm’ series on Telecom TV. Amazon has many platforms that fuels growth such as:
- Amazon Merchants
- Amazon Associates
- Amazon E-Commerce Platform
- Web Scale Computing Platform
- Amazon Kindle
- Telecommunications platfrom using Amazon's platform?
- All Things Distributed - Werner Vogels' weblog on building scalable and robust distributed systems
- Werner Vogels on Wikipedia.org
- Information Week - Chief Of The Year: Amazon CTO Werner Vogels and related Q&A
- The Amazon.com Technology Platform: Building Blocks for Innovation
- Amazon's Platform Business Model
- Episode 40: Interview Werner Vogels
- ACM Queue - A conversation with Werner Vogels
- ITConversations: Supernova – Werner Vogels on Scalability
- Sell on Amazon: A Guide to Amazon's Marketplace, Seller Central, and Fulfillment by Amazon Programs
- Werner Vogels on Twitter