advertise

Entries from November 14, 2010 - November 20, 2010

Thursday
Nov182010

Announcing My Webinar on December 14th: What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications

It's time to do something a little different and for me that doesn't mean cutting off my hair and joining a monastery, nor does it mean buying a cherry red convertible (yet), it means doing a webinar!

  • On December 14th, 2:00 PM - 3:00 PM EST, I'll be hosting What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications.
  • The webinar is sponsored by VoltDB, but it will be completely vendor independent, as that's the only honor preserving and technically accurate way of doing these things.
  • The webinar will run about 60 minutes, with 40 minutes of speechifying and 20 minutes for questions.
  • The hashtag for the event on Twitter will be SQLNoSQL. I'll be monitoring that hashtag if you have any suggestions for the webinar or if you would like to ask questions during the webinar. 

The motivation for me to do the webinar was a talk I had with another audience member at the NoSQL Evening in Palo Alto. He said he came from a Java background and was confused about the future. His crystal ball wasn't working anymore. Should he invest more time on Java? Should he learn some variant of NoSQL? Should he focus on one of the many other alternative databases? Or do what exactly?

He was exasperated at the bewildering number of database options out there today and asked my opinion on what he should do. I get that question a lot. And 30 seconds before the next session starting was not enough time for a real answer. So I hope to give the answer that I wanted to give then, here, in this webinar. 

We've all probably had that helpless feeling of facing a massive list of strangely named databases, each matched against a list of a dozen cryptic sounding features, wondering how the heck we should make a decision? In the past there was a standard set of options. A few popular relational databases ruled and your job as a programmer was to force the square peg of your problem into the round hole of the relational database.

Then a few intrepid souls, like LiveJournal and Google, went off script and paved their own way, building specialized systems that solved their own specific problems. Over time those systems have generalized into the abundance we have today. It's as Mae West, seductive siren of the silver screen, once said "Too much of a good thing can be wonderful!"

We are in a time of great change, creativity, and opportunity. It's a little confusing, sure, but it's also a cool time, an optimistic time. We can now work together to solve problems faster, better, and in larger numbers than ever before. We can now build something new and different and it's faster, easier, and cheaper than ever before. The question is, where to start?

In this webinar what I hope to do is help you figure out how to answer the "What should I do" question for yourself, like what I try to do in my blog, only more conversational. We'll take a use case approach. I promise we won't spend 20 minutes on CAP or other eyes-glazing-over topics. We'll try to look at what you need to do and use your requirements to figure out which product, or more likely, set of products to use.

That's the plan. I really hope you can attend. If you like this blog I think you'll like the webinar too. And if you have a friend or coworker you think could benefit from the webinar please forward them this link.

This is my first webinar, so if you have any advice on how not to suck please comment here, email me directly, or use the SQLNoSQL hashtag and I'll see it. I'd appreciate the advice. If you have suggestions about what you would like me to talk about or what you think the right answer is, please let me know that too. All inputs welcome.

thanks

Wednesday
Nov172010

Some Services are More Equal than Others

 

pig1

Remember when the iPhone launched? Remember the complaints about the device not maintaining calls well? Was it really the hardware? Or was it the service provider network, overwhelmed by not just the call volume but millions of hyper-customers experimenting with their new toy? Look – a video! Look a video and a call. Hey, I’m on Facebook, Twitter, YouTube, and streaming audio at the same time I’m making a call! How awesome is that?

Meanwhile, there’s an entire army of operators at a service provider’s NOC who are stalking through the data center with scissors because it’s the only way to stop the madness.

Click to read more ...

Tuesday
Nov162010

Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month

You may have read somewhere that Facebook has introduced a new Social Inbox integrating email, IM, SMS,  text messages, on-site Facebook messages. All-in-all they need to store over 135 billion messages a month. Where do they store all that stuff? Facebook's Kannan Muthukkaruppan gives the surprise answer in The Underlying Technology of Messages: HBase. HBase beat out MySQL, Cassandra, and a few others.

Why a surprise? Facebook created Cassandra and it was purpose built for an inbox type application, but they found Cassandra's eventual consistency model wasn't a good match for their new real-time Messages product. Facebook also has an extensive MySQL infrastructure, but they found performance suffered as data set and indexes grew larger. And they could have built their own, but they chose HBase.

HBase is a scaleout table store supporting very high rates of row-level updates over massive amounts of data. Exactly what is needed for a Messaging system. HBase is also a column based key-value store built on the BigTable model. It's good at fetching rows by key or scanning ranges of rows and filtering. Also what is needed for a Messaging system. Complex queries are not supported however. Queries are generally given over to an analytics tool like Hive, which Facebook created to make sense of their multi-petabyte data warehouse, and Hive is based on Hadoop's file system, HDFS, which is also used by HBase.

Facebook chose HBase because they monitored their usage and figured out what the really needed. What they needed was a system that could handle two types of data patterns:

  1. A short set of temporal data that tends to be volatile
  2. An ever-growing set of data that rarely gets accessed

Makes sense. You read what's current in your inbox once and then rarely if ever take a look at it again. These are so different one might expect two different systems to be used, but apparently HBase works well enough for both.

Some key aspects of their system:

Click to read more ...

Monday
Nov152010

How Google's Instant Previews Reduces HTTP Requests

In a strange case of synchronicity, Google just published Instant Previews: Under the hood, a very well written blog post by Matías Pelenur of the Instant Previews team, giving some fascinating inside details on how Google implemented Instant Previews. It's syncronicty because I had just posted Strategy: Biggest Performance Impact Is To Reduce The Number Of HTTP Requests and one of the major ideas behind the design Instant Previews is to reduce the number of HTTP requests through a few well chosen tricks. Cosmic!

Some of what Google does to reduce HTTP requests:

Click to read more ...

Monday
Nov152010

Strategy: Biggest Performance Impact is to Reduce the Number of HTTP Requests

Low Cost, High Performance, Strong Security: Pick Any Three by Chris Palmer has a funny and informative presentation where the main message is: reduce the size and frequency of network communications, which will make your pages load faster, which will improve performance enough that you can use HTTPS all the time, which will make you safe and secure on-line, which is a good thing.

The benefits of HTTPS for security are overwhelming, but people are afraid of the performance hit. The argument is successfully made that the overhead of HTTPS is low enough that you can afford the cost if you do some basic optimization. Reducing the number of HTTP requests is a good source of low hanging fruit.

From the Yahoo UI Blog:

Reducing the number of HTTP requests has the biggest impact on reducing response time and is often the easiest performance improvement to make.

Click to read more ...