Kngine Snippet Search New Indexing Technology

While Kngine just announce some improvement and new features, I would like you take you in small trip in Snippet Search research project at Kngine.


Click to read more ...


What I'm Thankful For on Thanksgiving

I try to keep this blog targeted and on topic. So even though I may be thankful for the song of the tinniest sparrow at sunrise, I'll save you from all that. It's hard to tie scalability and the giving of thanks together, especially as it sometimes occurs to me that this blog may be a self-indulgent waste of time. But I think I found a sentiment in A New THEORY of AWESOMENESS and MIRACLES by James Bridle that manages to marry the topic of this blog and giving thanks meaningfully together:

I distrust commercial definitions of innovation, and particularly of awesomeness. It’s an overused term. When I think of awesomeness, I want something awe-inspiring, vast and mind-expanding.

So I started thinking about things that I think are awesome, or miraculous, and for me, it kept coming back to scale and complexity.

We’re not actually very good about thinking about scale and complexity in real terms, so we have to use metaphors and examples. Douglas Adams writes somewhere about how big the Hitchhiker’s Guide to the Galaxy actually is—imagine a sheet of paper, then a filing cabinet full of sheets of paper, then a room full of filing cabinets, then a skyscraper full of rooms, then a city full of skyscrapers, a country, a planet, a solar system and so on. I couldn’t find the exact quote, so his thoughts on space will have to do:

Just wonderful. I especially love the quote So I started thinking about things that I think are awesome, or miraculous, and for me, it kept coming back to scale and complexity. This perfectly sums up why the topic of scalability is so endlessly diverting. It can take you anywhere you want to go and everything eventually ends up back again.

Thanks for reading and...

Happy Thanksgiving!


Brian Aker's Hilarious NoSQL Stand Up Routine

Brian Aker gave this 10 minute lightning talk on NoSQL at the Nov 2009 OpenSQLCamp in Portland, Oregon. It's incredibly funny, probably because there's a lot of truth to what he's saying.

Here are the slides and here are the notes. Found though #nosql.


Hot Scalability Links for Nov 24 2009


Big Data on Grids or on Clouds? 

 Contributed by Wolfgang Gentzsch:

Now that we have a new computing paradigm, Cloud Computing, how can Clouds help our data? Replace our internal data vaults as we hoped Grids would? Are Grids dead now that we have Clouds? Despite all the promising developments in the Grid and Cloud computing space, and the avalanche of publications and talks on this subject, many people still seem to be confused about internal data and compute resources, versus Grids versus Clouds, and they are hesitant to take the next step. I think there are a number of issues driving this uncertainty.

read more at:


10 eBay Secrets for Planet Wide Scaling

You don't even have to make a bid, Randy Shoup, an eBay Distinguished Architect, gives this presentation on how eBay scales, for free. Randy has done a fabulous job in this presentation and in other talks listed at the end of this post getting at the heart of the principles behind scalability. It's more about ideas of how things work and fit together than a focusing on a particular technology stack.

Impressive Stats

In case you weren't sure, eBay is big, with lots of: users, data, features, and change...

  • Over 89 million active users worldwide
  • 190 million items for sale in 50,000 categories
  • Over 8 billion URL requests per day
  • Hundreds of new features per quarter
  • Roughly 10% of items are listed or ended every day
  • In 39 countries and 10 languages
  • 24x7x365
  • 70 billion read / write operations / day
  • Processes 50TB of new, incremental data per day
  • Analyzes 50PB of data per day

10 Lessons

Click to read more ...


Building Scalable Systems Using Data as a Composite Material

Think of building websites as engineering composite materials. A composite material is when two or more materials are combined to create a third material that does something useful that the components couldn't do on their own. Composites like reinforced concrete have revolutionized design and construction. When building websites we usually bring different component materials together, like creating a composite, to get the features we need rather than building a completely new thing from scratch that does everything we want.

This approach has been seen as a hack because it leads to inelegancies like data duplication; great gobs of component glue; consistency issues; and messy operations. But what if the the composite approach is really a strength, not a hack, but a messy part of the world that needs to be embraced rather than belittled?

They key is to see data as a material. Right now we are arguing which is the best single material to build with. Is it NoSQL, relational, massively parallel, graph,  in-memory, or something else entirely? It all seems a bit crazy. Each material has both limits and capabilities. What we need to think of building is a composite material that combines the best characteristics of what is available into something better.

Click to read more ...


Hot Scalability Links for Nov 11 2009  


10 NoSQL Systems Reviewed

Jonathan Ellis reviews in the NoSQL Ecosystem the origin of the NoSQL movement and 10 different NoSQL products and how their 1) support for multiple datacenters,  2) the ability to add new machines to a live cluster transparently to the your applications, 3) Data Model, 4) Query API, 5) Persistence Design. The 10 systems reviewed are: Cassandra, CouchDB, HBase, MongoDB, Neo4J, Redis, Riak, Scalaris, Tokyo Cabinet, Voldemort.

A very thorough and thoughtful article on the entire NoSQL space. It's clear from the article that NoSQL is not monolithic, there is a very wide variety of approaches to not being a relational database.

Click to read more ...


Product: Resque - GitHub's Distrubuted Job Queue

Queuing work for processing in the background is a time tested scalability strategy. Queuing also happens to be one of those much needed tools where it easy enough to forge for your own that we see a lot of different versions made. Resque is GitHub's take on a job queue and they've used it to process million and millions of jobs so far.

What is Resque?

Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later. Background jobs can be any Ruby class or module that responds to perform. Your existing classes can easily be converted to background jobs or you can create new classes specifically to do work. Or, you can do both.

GitHub tried and considered many other systems: SQS, Starling, ActiveMessaging, BackgroundJob, DelayedJob, beanstalkd, AMQP,  and Kestrel, but found them all wanting in one way are another. The latency for SQS was too high. Others didn't make full use of Ruby. Others still had a lot of overhead. Some didn't have enough features. And still others weren't reliable enough.

Click to read more ...