NoSQL on the Microsoft Platform

NoSQL is a trend that is gaining steam primarily in the world of Open Source. There are numerous NoSQL solutions available for all levels of complexity: from queryable distributed solutions like MongoDB to simpler distributed key-value storage solutions like Cassandra. Then there’s Riak, Tokyo Cabinet, Voldemort, CouchDB, and Redis. However, very few of these packaged NoSQL products are available for the other end of the platform market: Microsoft Windows. I’m going to outline what’s available now and briefly touch on some opportunities that are still available to the daring Microsoft engineer.

You can read the full story here.


ArchCamp: Scalable Databases (NoSQL)

 ArchCamp: Scalable Databasess (NoSQL)

The ArchCamp unconference was held this past Friday at HackerDojo in Mountain View, CA.  There was plenty of pizza, beer, and great conversation.  This session started out free-form, but shaped up pretty quickly into a discussion of the popular open source scalable NoSQL databases and the architectural categories in which they belong.


Hot Scalability Links for Aug 6, 2010

  • Twitter Sees Its 20 Billionth Tweet writes  Marshall Kirkpatrick of ReadWriteWeb.
  • Startups die for not having customers, so STOP thinking about how to scale. Alessandro Orsi says focusing on the architecture and scaling possibilities of your app for millions of users is just plain dumb...concentrate on marketing...concentrate on user experience. Alessandro is perfectly correct, but this isn't the year the 2000 when the default architecture that is easy is also not scalable and when sites were built from scratch one painful user at a time.  Today neither is tue. In the era of social networks, where Facebook has 500 million users, successful applications can and often do spike to millions of users seemingly overnight. And you have to have some architecture. With today's tool-chains you don't have to choose easy and non-scalable. There are other options. Of course, it's all pointless without customers and that is what you need to worry about, but it's a false choice in this era to think that's all you have to worry about.

Click to read more ...


Pairing NoSQL and Relational Data Storage: MySQL with MongoDB

I’ve largely steered clear of publicly commenting on the “NoSQL vs. Relational” conflict. Keeping in mind that this argument is more about currently available solutions and the features their developers have chosen to build in, I’d like to dig into this and provide a decidedly neutral viewpoint. In fact, by erring on the side of caution, I’ve inadvertently given myself plenty of time to consider the pros and cons of both data storage approaches, and although my mind was initially swaying toward the NoSQL camp, I can say with a fair amount of certainty, that I’ve found a good compromise. 

You can read the full store here.


Dremel: Interactive Analysis of Web-Scale Datasets - Data as a Programming Paradigm

If Google was a boxer then MapReduce would be a probing right hand that sets up the massive left hook that is Dremel, Google's—scalable (thousands of CPUs, petabytes of data, trillions of rows), SQL based, columnar, interactive (results returned in seconds), ad-hoc—analytics system. If Google was a magician then MapReduce would be the shiny thing that distracts the mind while the trick goes unnoticed. I say that because even though Dremel has been around internally at Google since 2006, we have not heard a whisper about it. All we've heard about is MapReduce, clones of which have inspired entire new industries. Tricky.

Dremel, according to Brian Bershad, Director of Engineering at Google, is targeted at solving BigData class problems:

While we all know that systems are huge and will get even huger, the implications of this size on programmability, manageability, power, etc. is hard to comprehend. Alfred noted that the Internet is predicted to be carrying a zetta-byte (1021 bytes) per year in just a few years. And growth in the number of processing elements per chip may give rise to warehouse computers of having 1010 or more processing elements. To use systems at this scale, we need new solutions for storage and computation.

Click to read more ...


7 Scaling Strategies Facebook Used to Grow to 500 Million Users

Robert Johnson, a director of engineering at Facebook, celebrated Facebook's monumental achievement of reaching 500 million users by sharing the scaling principles that helped reach that milestone. In case you weren't suitably impressed by the 500 million user number, Robert ratchets up the numbers game with these impressive figures:
  • 1 million users per engineer
  • 500 million active users
  • 100 billion hits per day
  • 50 billion photos
  • 2 trillion objects cached, with hundreds of millions of requests per second
  • 130TB of logs every day

How did Facebook get to this point?

Click to read more ...


Basho Lives up to their Name With Consistent Smashing

For some Friday Fun nerd style, I thought this demonstration from Basho on the difference between single master, sharding, and consistent smashing was really clever. I love the use of safety glasses! And it's harder to crash a server with a hammer than you might think...

Recommended reading:



Hot Scalability Links for July 30, 2010

  • Jeremy Zawodny, while performing data alchemy in the dungeons of Craigslist, stored 1,250,000,000 Key/Value Pairs in Redis on a 32GB Machine.
  • Data sorting world record: 1 terabyte, 1 minute. The system has 52 computer nodes, each node is a commodity server with two quad-core processors, 24 gigabytes (GB) memory and sixteen 500 GB disks. It's not just hardware though, they also built a software that utilized all their CPU and RAM.
  • Tweets of Gold:
    • wm: I am really getting the sense that none of you yokels waxing profound about scalability actually has anything factual to say
    • joestump: I think you can do things to *mitigate* pain points up front. You don't need to over-engineer, but it's not hard to look forward.
    • danielcrenna: I love it when I check in debug code accidentally and it turns into a three day hunt for a major scalability problem
    • joestump: Your post also makes me think of another phrase I say often: Scaling == Specialization. Bigger scale = More specialization.

Click to read more ...


YeSQL: An Overview of the Various Query Semantics in the Post Only-SQL World

The NoSQL movement faults the SQL query language as the source of many of the scalability issues that we face today with traditional database approach.

I think that the main reason so many people have come to see SQL as the source of all evil is the fact that, traditionally, the query language was burned into the database implementation. So by saying NoSQL you basically say "No" to the traditional non-scalable RDBMS implementations.

This view has brought on a flood of alternative query languages, each aiming to solve a different aspect that is missing in the traditional SQL query approach, such as a document model, or that provides a simpler approach, such as Key/Value query.

Most of the people I speak with seem fairly confused on this subject, and tend to use query semantics and architecture interchangeably. In Part I of this post i tried to provide quick overview of what each query term stands for in the context of the NoSQL world . Part II illustrates those ideas using  code examples from GigaSpaces and Datanucleus/Hbase.

See  Part I , Part II for more information..

Click to read more ...


A Metric A$$-Ton of Joe Stump: The Cloud is Cheaper than Bare Metal

Should you pay more in the cloud or pay less for bare metal in the datacenter? This is a crucial decision point facing startups today. Which way should you go? In this interview, Joe Stump, always a go-to guy when you need a metric ass-ton (a favorite expression of Joe’s) of good advice on cutting edge practices for the modern startup, laughs at conventional wisdom by saying the cloud is really not more expensive than bare metal.

The argument for a cheaper cloud has a three main points:

Click to read more ...