advertise
Wednesday
Dec012010

Sponsored Post: Cloudkick, Strata, Undertone, Joyent, Appirio, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

Fun and Informative Events

Cool Products and Services

Click to read more ...

Tuesday
Nov302010

NoCAP – Part III – GigaSpaces clustering explained..

In many of the recent discussions on the design of large scale systems (a.k.a. Web Scale) it was argued that the right set of tradeoffs for building large scale systems would be to give away Consistency for Availability and Partition tolerance. Those arguments relied on the foundation of the CAP theorem developed in early 2000-2002. One of the core principals behind the CAP theorem is that you must choose two out of the three CAP properties. In many of the transactional systems giving away consistency is either impossible or yields a huge complexity in the design of those systems. In this series of posts, I've tried to suggest a different set of tradeoffs in which we could achieve scalability without compromising on consistency. I also argued that rather than choosing only two out of the three CAP properties we could choose various degrees of all three. The degrees would be determined by the most likely availability and partition tolerance scenarios in our specific application.  The suggested model was based on the experience we had in GigaSpaces over the course of the past years and was successfully deployed in many mission critical systems today in Finance, Telco and ecommerce business. I hope that through the sharing of this experience we could come up with a broader set of patterns on how to build large scale systems that would fit also to mission critical transactional systems. Read more... 

 

Monday
Nov292010

Stuff the Internet Says on Scalability For November 29th, 2010

Eating turkey all weekend and wondering what you might have missed?

Wednesday
Nov242010

Great Introductory Video on Scalability from Harvard Computer Science

Professor David Malan gives a very good lecture on scalability for dynamic websites. It's not highly technical, it's an extension course, but it's a great introduction to a wide variety of topics. I really like his teaching style. He continually asks questions, prompts for input, and gives accessible explanations. Some of the topics covered: vertical scaling; horizontal scaling; PHP acceleration; load balancing: DNS, L7, sticky sessions, load balancers; caching; MySQL: replication, load balancing, partitioning, high availability.

Watch it on Academic Earth

This is one lecture in a series of 13 lectures on building dynamic websites. Students learn how to: build dynamic websites with Ajax and with LinuxApacheMySQL, and PHP (LAMP); set up domain names with DNSstructure pages with XHTML and CSS how to program in JavaScript and PHPconfigure Apacheand MySQL; design and query databases with SQL; use Ajax with both XML andJSON build mashups.

For the lecture notes go to the OpenCourseWare site.

Related Articles

 

Tuesday
Nov232010

Sponsored Post: Imo, Undertone, Joyent, Appirio, Tuenti, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

Fun and Informative Events

Cool Products and Services

Click to read more ...

Monday
Nov222010

Strategy: Google Sends Canary Requests into the Data Mine

Google runs queries against thousands of in-memory index nodes in parallel and then merges the results. One of the interesting problems with this approach, explains Google's Jeff Dean in this lecture at Stanford, is the Query of Death.

A query can cause a program to fail because of bugs or various other issues. This means that a single query can take down an entire cluster of machines, which is not good for availability and response times, as it takes quite a while for thousands of machines to recover. Thus the Query of Death. New queries are always coming into the system and when you are always rolling out new software, it's impossible to completely get rid of the problem.

Two solutions:

Click to read more ...

Thursday
Nov182010

Announcing My Webinar on December 14th: What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications

It's time to do something a little different and for me that doesn't mean cutting off my hair and joining a monastery, nor does it mean buying a cherry red convertible (yet), it means doing a webinar!

  • On December 14th, 2:00 PM - 3:00 PM EST, I'll be hosting What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications.
  • The webinar is sponsored by VoltDB, but it will be completely vendor independent, as that's the only honor preserving and technically accurate way of doing these things.
  • The webinar will run about 60 minutes, with 40 minutes of speechifying and 20 minutes for questions.
  • The hashtag for the event on Twitter will be SQLNoSQL. I'll be monitoring that hashtag if you have any suggestions for the webinar or if you would like to ask questions during the webinar. 

The motivation for me to do the webinar was a talk I had with another audience member at the NoSQL Evening in Palo Alto. He said he came from a Java background and was confused about the future. His crystal ball wasn't working anymore. Should he invest more time on Java? Should he learn some variant of NoSQL? Should he focus on one of the many other alternative databases? Or do what exactly?

He was exasperated at the bewildering number of database options out there today and asked my opinion on what he should do. I get that question a lot. And 30 seconds before the next session starting was not enough time for a real answer. So I hope to give the answer that I wanted to give then, here, in this webinar. 

We've all probably had that helpless feeling of facing a massive list of strangely named databases, each matched against a list of a dozen cryptic sounding features, wondering how the heck we should make a decision? In the past there was a standard set of options. A few popular relational databases ruled and your job as a programmer was to force the square peg of your problem into the round hole of the relational database.

Then a few intrepid souls, like LiveJournal and Google, went off script and paved their own way, building specialized systems that solved their own specific problems. Over time those systems have generalized into the abundance we have today. It's as Mae West, seductive siren of the silver screen, once said "Too much of a good thing can be wonderful!"

We are in a time of great change, creativity, and opportunity. It's a little confusing, sure, but it's also a cool time, an optimistic time. We can now work together to solve problems faster, better, and in larger numbers than ever before. We can now build something new and different and it's faster, easier, and cheaper than ever before. The question is, where to start?

In this webinar what I hope to do is help you figure out how to answer the "What should I do" question for yourself, like what I try to do in my blog, only more conversational. We'll take a use case approach. I promise we won't spend 20 minutes on CAP or other eyes-glazing-over topics. We'll try to look at what you need to do and use your requirements to figure out which product, or more likely, set of products to use.

That's the plan. I really hope you can attend. If you like this blog I think you'll like the webinar too. And if you have a friend or coworker you think could benefit from the webinar please forward them this link.

This is my first webinar, so if you have any advice on how not to suck please comment here, email me directly, or use the SQLNoSQL hashtag and I'll see it. I'd appreciate the advice. If you have suggestions about what you would like me to talk about or what you think the right answer is, please let me know that too. All inputs welcome.

thanks

Wednesday
Nov172010

Some Services are More Equal than Others

 

pig1

Remember when the iPhone launched? Remember the complaints about the device not maintaining calls well? Was it really the hardware? Or was it the service provider network, overwhelmed by not just the call volume but millions of hyper-customers experimenting with their new toy? Look – a video! Look a video and a call. Hey, I’m on Facebook, Twitter, YouTube, and streaming audio at the same time I’m making a call! How awesome is that?

Meanwhile, there’s an entire army of operators at a service provider’s NOC who are stalking through the data center with scissors because it’s the only way to stop the madness.

Click to read more ...

Tuesday
Nov162010

Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month

You may have read somewhere that Facebook has introduced a new Social Inbox integrating email, IM, SMS,  text messages, on-site Facebook messages. All-in-all they need to store over 135 billion messages a month. Where do they store all that stuff? Facebook's Kannan Muthukkaruppan gives the surprise answer in The Underlying Technology of Messages: HBase. HBase beat out MySQL, Cassandra, and a few others.

Why a surprise? Facebook created Cassandra and it was purpose built for an inbox type application, but they found Cassandra's eventual consistency model wasn't a good match for their new real-time Messages product. Facebook also has an extensive MySQL infrastructure, but they found performance suffered as data set and indexes grew larger. And they could have built their own, but they chose HBase.

HBase is a scaleout table store supporting very high rates of row-level updates over massive amounts of data. Exactly what is needed for a Messaging system. HBase is also a column based key-value store built on the BigTable model. It's good at fetching rows by key or scanning ranges of rows and filtering. Also what is needed for a Messaging system. Complex queries are not supported however. Queries are generally given over to an analytics tool like Hive, which Facebook created to make sense of their multi-petabyte data warehouse, and Hive is based on Hadoop's file system, HDFS, which is also used by HBase.

Facebook chose HBase because they monitored their usage and figured out what the really needed. What they needed was a system that could handle two types of data patterns:

  1. A short set of temporal data that tends to be volatile
  2. An ever-growing set of data that rarely gets accessed

Makes sense. You read what's current in your inbox once and then rarely if ever take a look at it again. These are so different one might expect two different systems to be used, but apparently HBase works well enough for both.

Some key aspects of their system:

Click to read more ...

Monday
Nov152010

How Google's Instant Previews Reduces HTTP Requests

In a strange case of synchronicity, Google just published Instant Previews: Under the hood, a very well written blog post by Matías Pelenur of the Instant Previews team, giving some fascinating inside details on how Google implemented Instant Previews. It's syncronicty because I had just posted Strategy: Biggest Performance Impact Is To Reduce The Number Of HTTP Requests and one of the major ideas behind the design Instant Previews is to reduce the number of HTTP requests through a few well chosen tricks. Cosmic!

Some of what Google does to reduce HTTP requests:

Click to read more ...