Stuff the Internet Says on Scalability For November 12th, 2010

Click to read more ...


Paper: Hyder - Scaling Out without Partitioning 

Partitioning is what differentiates scaling-out from scaling-up, isn't it? I thought so too until I read Pat Helland's blog post on Hyder, a research database at Microsoft, in which the database is the log, no partitioning is required, and the database is multi-versioned. Not much is available on Hyder. There's the excellent summary post from Mr. Helland and these documents: Scaling Out without Partitioning and Scaling Out without Partitioning  - Hyder Update by Phil Bernstein and Colin Reid of Microsoft.

The idea behind Hyder as summarized by Pat Helland (see his blog for the full post):

Click to read more ...


The Tera-Scale Effect 

In the past year, Intel issued a series of powerful chips under the new Nehalem microarchitecture, with large numbers of cores and extensive memory capacity. This new class of chips is is part of a bigger Intel initiative referred to as Tera-Scale Computing. Cisco has released their Unified Computing System (UCS) equipped with a unique extended memory and high speed network within the box, which is specifically geared to take advantage of this type of CPU architecture .

This new class of hardware has the potential to revolutionize the IT landscape as we know it.

In  this post, I want to focus primarily on the potential implications on application architecture, more specifically on the application platform landscape.  more...


Facebook Uses Non-Stored Procedures to Update Social Graphs

Facebook's Ryan Mack gave a MySQL Tech Talk where he talked about using what he called Non-stored Procedures for adding edges to Facebook's social graph. The question is: how can edges quickly be added to the social graph? The answer is ultimately one of deciding where logic should be executed, especially when locks are kept open during network hops.

Ryan explained a key element of the Facebook data model are the connections between people, things they've liked, and places they've checked-in. A lot of their writes are adding edges to the social graph. 

Currently this is a two step process, run inside a transaction:

Click to read more ...


Sponsored Post: Imo, Membase, Playfish, Electronic Arts, Tagged, Undertone, Joyent, Appirio, Tuenti, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

Fun and Informative Events

  • Membase Meetups Coming to Major US Cities. The first of these technical meetups is on November 10th at Hewlett Packard in Cupertino.

Cool Products and Services

Click to read more ...


Hot Scalability Links For November 5th, 2010

So much good stuff this week...

  • Adrian Cockcroft Compares NoSQL Availability ModelsLet's risk feeding the CAP trolls, and try to get some insight into the differences between the many NoSQL contenders. Adrian asks how each NoSQL product will add a movie to its favorites list, read it back, and how this works across availability zones. Much trickier than it sounds with multiple writers. Cassandra and MongoDB answer back.
  • Stuff the Internet Says:
    • @jerng: Reading up on scalability. WHY THE HELL FOR? Because I want to know the future.
    • @freerangedata: The #nosql options are the micro brews/craft beers of data stores. So many good ones, so little time to try them all.
    • @edward_ribeiro: Soon, Darwinism will start to play its role on #NoSQL systems. You know, only the fittest will survive.
    • @connectionreq: I'm always wowed when I hear how Facebook abuses their MySQL databases in crazy ways
    • @louismrose: This is the kind of scalability we should be working on...
  • Redis at Superfeedr. Each of our redis servers process on average 3500 queries per second.

Click to read more ...


Facebook at 13 Million Queries Per Second Recommends: Minimize Request Variance

Facebook gave a MySQL Tech Talk where they talked about many things MySQL, but one of the more subtle and interesting points was their focus on controlling the variance of request response times and not just worrying about maximizing queries per second.

But first the scalability porn. Facebook's OLTP performance numbers were as usual, quite dramatic:

  • Query response times: 4ms reads, 5ms writes. 
  • Rows read per second: 450M peak
  • Network bytes per second: 38GB peak
  • Queries per second: 13M peak
  • Rows changed per second: 3.5M peak
  • InnoDB disk ops per second: 5.2M peak

 Some thoughts on creating quality, not quantity:

Click to read more ...


Hot Trend: Move Behavior to Data for a New Interactive Application Architecture

Two forces account for the trend of moving behavior to data: larger values used in key-value stores and spotty cloud networks. For some time we've seen functions pushed close to data with MapReduce, which is a batch process, but we are now seeing this model extend to interactive applications, which match the current emphasis on highly scalable, real-time, event driven applications.

To see the trend look at the increasing support for collocated behavior at the datastore level:

Click to read more ...


NoSQL Took Away the Relational Model and Gave Nothing Back

Update: Benjamin Black said he was the source of the quote and also said I was wrong about what he meant. His real point: The meaning of the statement was that NoSQL systems (really the various map-reduce systems) are lacking a standard model for describing and querying and that developing one should be a high priority task for them.

At the A NoSQL Evening in Palo Alto, an audience member, sorry, I couldn't tell who, said something I found really interesting: NoSQL took away the relational model and gave nothing back.

The idea being that NoSQL has focussed on ease of use, scalability, performance, etc, but it has lost the idea of how data relates to other data. True to its name, the relational model is very good at capturing a managing relationships. With NoSQL all relationships have been pushed back onto the poor programmer to implement in code rather than the database managing it. We've sacrificed usability. NoSQL is about concurrency, latency, and scalability, but it's not about data.

My ears perked up because I said something similar a while back while commenting on VoltDB's criticism of the NoSQL transaction model: I agree completely that moving the repair logic to the programmer is a recipe for disaster. Having programmers worry about read repair, vector clocks, the commutativity of transactions, how to design compensatory transactions to make up for previous failed transactions, and the other very careful bits of design, is asking for a very fragile system. ACID transactions are clean and understandable and that's why people like them.

Relationships are very much in the same spirit. Managing relationships without explicit support or multiple object transaction support puts a huge burden on the programmer. At one level key-value systems are far simpler at every level to use. That's great. But, for more complex data all that work really comes back and falls on the overburdened shoulders of the programmer. What I liked about this comment during the event is that it put an emphasis on making the programmers life easier across a wider variety of use cases, which is always a good thing, and it was worth surfacing.

Related Articles

  • Interesting Hacker News Thread, especially the commentary on state of the art IO subsystems.


Notes from A NOSQL Evening in Palo Alto 

I along with 180 other people and veritable who's who of NoSQL vendors, attended the A NoSQL Evening in Palo Alto NoSQL Meetup on Tuesday. The format was a panel of 10 vendors--10gen, Basho, CouchOne, Cloudant, Cloudera, GoGrid, InfiniteGraph, Membase, Riptano, Scality--sitting in two rows of chairs in front of what seemed like a pretty diverse audience. Tim Anglade (founder, A NOSQL Summer) moderated. Tim kept things moving by asking a few leading questions and the panel chimed in with answers. Quite a few questions came from the audience, which was refreshing. 

Overall a genial evening with some good discussion. I was pleased that the panel members didn't just automatically slip into marketing speak. Most of the discussions were on point rather than just another excuse to hit the talking points. There were some complaints about the talk not being technical enough, but I don't think that was really the purpose of this kind of talk. The panel format is excellent at giving a wide range of views on general topics, and that's exactly how the evening went.

Some key takeaways:

  • Good energy. A lot of people are trying to good things and are excited to be in a space where technology still matters more than politics. Real problems are being solved for customers and that's motivating.
  • NoSQL took away the relational model and gave nothing back. Using NoSQL for complex data puts way too much pressure on the programmer.
  • NoSQL will not converge. There's no consensus on what the next thing will be, so we are unlikely to see any standardization in the NoSQL world any time soon. There is a convergence on some features, but it seems the products will evolve to serve specific markets. This is not a bad thing. NoSQL doesn't need to converge on one stack. Products can remain differentiated by being able solve specific problems.
  • NoSQL has a parallel to the "back to the land movement". As the relational world and the framework world got ever more complex and expensive, a counter movement developed that sought out simplicity and transparency. 

Click to read more ...