GPU vs CPU Smackdown : The Rise of Throughput-Oriented Architectures

In some ways the original Amazon cloud, the one most of us still live in, was like that really cool house that when you stepped inside and saw the old green shag carpet in the living room, you knew the house hadn't been updated in a while. The network is a little slow, the processors are a bit dated, and virtualization made the house just feel smaller. It has been difficult to run high bandwidth or low latency workloads in the cloud. Bottlenecks everywhere. Not a big deal for most applications, but for many high performance applications (HPC) it was a killer.

In a typical house you might just do a remodel. Upgrade a few rooms. Swap out builder quality appliances with gleaming stainless steel monsters. But Amazon has a big lot, instead of remodeling they simply keep adding on entire new wings, kind of like the Winchester Mystery House of computing.

The first new wing added was a CPU based HPC system featuring blazingly fast Nehalem chips, virtualization replaced by a close to metal Hardware Virtual Machine (HVM) architecture, and the network is a monster 10 gigabits with the ability to specify placement groups to carve out a low-latency, high bandwidth cluster. Bottlenecks removed. Most people still probably don't even know this part of the house exists.

The newest addition is a beauty, it's a graphics processing unit (GPU) cluster as described by Werner Vogels in Expanding the Cloud - Adding the Incredible Power of the Amazon EC2 Cluster GPU Instances . It's completely modern and contemporary. The shag carpet is out. In are Nvidia M2050 GPU based clusters which make short work of applications in the sciences, finance, oil & gas, movie studios and graphics.

Click to read more ...


8 Commonly Used Scalable System Design Patterns

Ricky Ho in Scalable System Design Patterns has created a great list of scalability patterns along with very well done explanatory graphics. A summary of the patterns are:

  1. Load Balancer - a dispatcher determines which worker instance will handle a request based on different policies.
  2. Scatter and Gather - a dispatcher multicasts requests to all workers in a pool. Each worker will compute a local result and send it back to the dispatcher, who will consolidate them into a single response and then send back to the client.
  3. Result Cache - a dispatcher will first lookup if the request has been made before and try to find the previous result to return, in order to save the actual execution.
  4. Shared Space - all workers monitors information from the shared space and contributes partial knowledge back to the blackboard. The information is continuously enriched until a solution is reached.
  5. Pipe and Filter - all workers connected by pipes across which data flows.
  6. MapReduce -  targets batch jobs where disk I/O is the major bottleneck. It use a distributed file system so that disk I/O can be done in parallel.
  7. Bulk Synchronous Parallel - a  lock-step execution across all workers, coordinated by a master.
  8. Execution Orchestrator - an intelligent scheduler / orchestrator schedules ready-to-run tasks (based on a dependency graph) across a clusters of dumb workers.

Sponsored Post: Cloudkick, Strata, Undertone, Joyent, Appirio, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

Fun and Informative Events

Cool Products and Services

Click to read more ...


NoCAP – Part III – GigaSpaces clustering explained..

In many of the recent discussions on the design of large scale systems (a.k.a. Web Scale) it was argued that the right set of tradeoffs for building large scale systems would be to give away Consistency for Availability and Partition tolerance. Those arguments relied on the foundation of the CAP theorem developed in early 2000-2002. One of the core principals behind the CAP theorem is that you must choose two out of the three CAP properties. In many of the transactional systems giving away consistency is either impossible or yields a huge complexity in the design of those systems. In this series of posts, I've tried to suggest a different set of tradeoffs in which we could achieve scalability without compromising on consistency. I also argued that rather than choosing only two out of the three CAP properties we could choose various degrees of all three. The degrees would be determined by the most likely availability and partition tolerance scenarios in our specific application.  The suggested model was based on the experience we had in GigaSpaces over the course of the past years and was successfully deployed in many mission critical systems today in Finance, Telco and ecommerce business. I hope that through the sharing of this experience we could come up with a broader set of patterns on how to build large scale systems that would fit also to mission critical transactional systems. Read more... 



Stuff the Internet Says on Scalability For November 29th, 2010

Eating turkey all weekend and wondering what you might have missed?


Great Introductory Video on Scalability from Harvard Computer Science

Professor David Malan gives a very good lecture on scalability for dynamic websites. It's not highly technical, it's an extension course, but it's a great introduction to a wide variety of topics. I really like his teaching style. He continually asks questions, prompts for input, and gives accessible explanations. Some of the topics covered: vertical scaling; horizontal scaling; PHP acceleration; load balancing: DNS, L7, sticky sessions, load balancers; caching; MySQL: replication, load balancing, partitioning, high availability.

Watch it on Academic Earth

This is one lecture in a series of 13 lectures on building dynamic websites. Students learn how to: build dynamic websites with Ajax and with LinuxApacheMySQL, and PHP (LAMP); set up domain names with DNSstructure pages with XHTML and CSS how to program in JavaScript and PHPconfigure Apacheand MySQL; design and query databases with SQL; use Ajax with both XML andJSON build mashups.

For the lecture notes go to the OpenCourseWare site.

Related Articles



Sponsored Post: Imo, Undertone, Joyent, Appirio, Tuenti, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

Fun and Informative Events

Cool Products and Services

Click to read more ...


Strategy: Google Sends Canary Requests into the Data Mine

Google runs queries against thousands of in-memory index nodes in parallel and then merges the results. One of the interesting problems with this approach, explains Google's Jeff Dean in this lecture at Stanford, is the Query of Death.

A query can cause a program to fail because of bugs or various other issues. This means that a single query can take down an entire cluster of machines, which is not good for availability and response times, as it takes quite a while for thousands of machines to recover. Thus the Query of Death. New queries are always coming into the system and when you are always rolling out new software, it's impossible to completely get rid of the problem.

Two solutions:

Click to read more ...


Announcing My Webinar on December 14th: What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications

It's time to do something a little different and for me that doesn't mean cutting off my hair and joining a monastery, nor does it mean buying a cherry red convertible (yet), it means doing a webinar!

  • On December 14th, 2:00 PM - 3:00 PM EST, I'll be hosting What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications.
  • The webinar is sponsored by VoltDB, but it will be completely vendor independent, as that's the only honor preserving and technically accurate way of doing these things.
  • The webinar will run about 60 minutes, with 40 minutes of speechifying and 20 minutes for questions.
  • The hashtag for the event on Twitter will be SQLNoSQL. I'll be monitoring that hashtag if you have any suggestions for the webinar or if you would like to ask questions during the webinar. 

The motivation for me to do the webinar was a talk I had with another audience member at the NoSQL Evening in Palo Alto. He said he came from a Java background and was confused about the future. His crystal ball wasn't working anymore. Should he invest more time on Java? Should he learn some variant of NoSQL? Should he focus on one of the many other alternative databases? Or do what exactly?

He was exasperated at the bewildering number of database options out there today and asked my opinion on what he should do. I get that question a lot. And 30 seconds before the next session starting was not enough time for a real answer. So I hope to give the answer that I wanted to give then, here, in this webinar. 

We've all probably had that helpless feeling of facing a massive list of strangely named databases, each matched against a list of a dozen cryptic sounding features, wondering how the heck we should make a decision? In the past there was a standard set of options. A few popular relational databases ruled and your job as a programmer was to force the square peg of your problem into the round hole of the relational database.

Then a few intrepid souls, like LiveJournal and Google, went off script and paved their own way, building specialized systems that solved their own specific problems. Over time those systems have generalized into the abundance we have today. It's as Mae West, seductive siren of the silver screen, once said "Too much of a good thing can be wonderful!"

We are in a time of great change, creativity, and opportunity. It's a little confusing, sure, but it's also a cool time, an optimistic time. We can now work together to solve problems faster, better, and in larger numbers than ever before. We can now build something new and different and it's faster, easier, and cheaper than ever before. The question is, where to start?

In this webinar what I hope to do is help you figure out how to answer the "What should I do" question for yourself, like what I try to do in my blog, only more conversational. We'll take a use case approach. I promise we won't spend 20 minutes on CAP or other eyes-glazing-over topics. We'll try to look at what you need to do and use your requirements to figure out which product, or more likely, set of products to use.

That's the plan. I really hope you can attend. If you like this blog I think you'll like the webinar too. And if you have a friend or coworker you think could benefit from the webinar please forward them this link.

This is my first webinar, so if you have any advice on how not to suck please comment here, email me directly, or use the SQLNoSQL hashtag and I'll see it. I'd appreciate the advice. If you have suggestions about what you would like me to talk about or what you think the right answer is, please let me know that too. All inputs welcome.



Some Services are More Equal than Others



Remember when the iPhone launched? Remember the complaints about the device not maintaining calls well? Was it really the hardware? Or was it the service provider network, overwhelmed by not just the call volume but millions of hyper-customers experimenting with their new toy? Look – a video! Look a video and a call. Hey, I’m on Facebook, Twitter, YouTube, and streaming audio at the same time I’m making a call! How awesome is that?

Meanwhile, there’s an entire army of operators at a service provider’s NOC who are stalking through the data center with scissors because it’s the only way to stop the madness.

Click to read more ...