Strategy: Scale Writes to 734 Million Records Per Day Using Time Partitioning

In Scaling writes in MySQL (slides) Philip Tellis, while working for Yahoo, describes how using time based partitions they were able to increase their write capability from 2100 inserts per second (7 million a day) to a sustained 8500 inserts per second (734 million a day). This was capacity enough to handle the load during Michael Jackson's memorial service. In summary, the secrets to scalable writes are:

Click to read more ...


7 Lessons Learned While Building Reddit to 270 Million Page Views a Month

Steve Huffman, co-founder of social news site Reddit, gave an excellent presentation (slides, transcript) on the lessons he learned while building and growing Reddit to 7.5 million users per month, 270 million page views per month, and 20+ database servers.

Steve says a lot of the lessons were really obvious, so you may not find a lot of completely new ideas in the presentation. But Steve has an earnestness and genuineness about him that is so obviously grounded in experience that you can't help but think deeply about what you could be doing different. And if Steve didn't know about these lessons, I'm betting others don't either.

There are seven lessons, each has their own summary section: Lesson one: Crash Often; Lesson 2: Separation of Services; Lesson 3: Open Schema; Lesson 4: Keep it Stateless; Lesson 5: Memcache; Lesson 6: Store Redundant Data; Lesson 7: Work Offline.

By far the most surprising feature of their architecture is in Lesson Six, whose essential idea is:

Click to read more ...


Hot Scalability Links for May 14, 2010

Lots of good ones this week...

Click to read more ...


The Rise of the Virtual Cellular Machines

My apologies if you were looking for a post about cell phones. This post is about high density nanodevices. It's a follow up to How will memristors change everything? for those wishing to pursue these revolutionary ideas in more depth. This is one of those areas where if you are in the space then there's a lot of available information and if you are on the outside then it doesn't even seem to exist. Fortunately, Ben Chandler from The SyNAPSE Project, was kind enough to point me to a great set of presentations given at the 12th IEEE CNNA - International Workshop on Cellular Nanoscale Networks and their Applications - Towards Megaprocessor Computing. WARNING: these papers contain extreme technical content. If you are like me and you aren't an electrical engineer, much of it may make a sort of surface sense, but the deep and twisty details will fly over head. For the more software minded there are a couple more accessible presentations:

Here a few excerpts from the presentations, just things I found particularly interesting. I'm still trying to make sense of it all and I thought you might be interested too. It's clear there's something new here and it will require different algorithms and programming models to work. What will those be and who will invent them?

Click to read more ...

May102010 Architecture - A Portal at 3900 Requests Per Second is one of the leading portals in India. is owned by the same company and is one of the top content aggregation sites in India, primarily targeting Non-resident Indians from around the world. Ramki Subramanian, an Architect at Sify, has been generous enough to describe the common back-end for both these sites. One of the most notable aspects of their architecture is that Sify does not use a traditional database. They query Solr and then retrieve records from a distributed file system. Over the years many people have argued for file systems over databases. Filesystems can work for key-value lookups, but they don't work for queries, using Solr is a good way around that problem. Another interesting aspect of their system is the use of Drools for intelligent cache invalidation. As we have more and more data duplicated in multiple specialized services, the problem of how to keep them synchronized is a difficult one. A rules engine is a clever approach.

Click to read more ...


Going global on EC2

Since its inception, Amazon EC2 has enabled companies to run highly scalable infrastructure with minimal overhead.  Over the years, Amazon Web Services has expanded with new offerings and additional regions around the world.

All this growth has made establishing a global footprint easier than ever.  And yet, most EC2 customers still choose to operate in a single region.  While this is fine for many applications, customers with significant web infrastructure are depriving users of drastically improved performance.  Deploying infrastructure in EC2's new regions cuts out one of the biggest sources of latency: distance.

In this post, I describe how Bizo significantly reduced load times by implementing Global Server Load Balancing (GSLB) to distribute traffic across all Amazon regions.


How will memristors change everything? 

A non-random sample of my tech friends shows that not many have heard of memristors (though I do suspect vote tampering). I'd read a little about memristors in 2008 when the initial hubbub about the existence of memristors was raised. I, however,  immediately filed them into that comforting conceptual bucket of potentially revolutionary technologies I didn't have to worry about because like most wondertech, nothing would ever come of it. Wrong. After watching Finding the Missing Memristor by R. Stanley Williams I've had to change my mind. Memristors have gone from "maybe never" to holy cow this could happen soon and it could change everything.

Let's assume for the sake of dreaming memristors do prove out. How will we design systems when we have access to a new material that is two orders of magnitude more efficient from a power perspective than traditional transistor technologies, contains multiple petabits (1 petabit = 128TB) of persistent storage, and can be reconfigured to be either memory or CPU in a package as small as a sugar cube (in a stacked configuration)?

Click to read more ...


Business continuity with real-time data integration

Enterprises want to protect their data. As the appetite for data volumes grows, storage technology becomes a critical business asset on which business continuity relies. My recent survey in the medium-size enterprise segment shows the five dominant investment directions at the level of data management architecture: disaster recovery (DR), high availability (HA), backup, data processing performance and migration to more advanced databases.


This suggests that corporations generally have sufficiently structured data collections but are concerned with business continuity and continuous availability of data. What infrastructures can provide these assurances? In this post I want to focus on yet another option, and that is the Real-Time Data Integration model. As an example I am going to discuss Oracle GoldenGate, which permits you to manage the data critical to your business in safety, ensuring business continuity without disruption even if the data is distributed among multiple, heterogeneous business applications and architectures.




MocoSpace Architecture - 3 Billion Mobile Page Views a Month

This is a guest post by Jamie Hall, Co-founder & CTO of MocoSpace, describing the architecture for their mobile social network. This is a timely architecture to learn from as it combines several hot trends: it is very large, mobile, and social. What they think is especially cool about their system is: how it optimizes for device/browser fragmentation on the mobile Web; their multi-tiered, read/write, local/distributed caching system; selecting PostgreSQL over MySQL as a relational DB that can scale.

MocoSpace is a mobile social network, with 12 million members and 3 billion page views a month, which makes it one of the most highly trafficked mobile Websites in the US. Members access the site mainly from their mobile phone Web browser, ranging from high end smartphones to lower end devices, as well as the Web. Activities on the site include customizing profiles, chat, instant messaging, music, sharing photos & videos, games, eCards and blogs. The monetization strategy is focused on advertising, on both the mobile and Websites, as well as a virtual currency system and a handful of premium feature upgrades.


Click to read more ...


100 Node Hazelcast cluster on Amazon EC2

Deploying, running and monitoring application on a big cluster is a challenging task. Recently Hazelcast team deployed a demo application on Amazon EC2 platform to show how Hazelcast p2p cluster scales and screen recorded the entire process from deployment to monitoring.

Hazelcast is open source (Apache License), transactional, distributed caching solution for Java. It is a little more than a cache though as it provides distributed implementation of map, multimap, queue, topic, lock and executor service. 

Details of running 100 node Hazelcast cluster on Amazon EC2 can be found here. Make sure to watch the screencast!