- Distributed and Highly Available Search Engine.
- Each index is fully sharded with a configurable number of shards.
- Each shard can have zero or more replicas.
- Read / Search operations performed on either replica shard.
Several readers had follow-up questions in response to this article. Luke's responses can be found in How FarmVille Scales - The Follow-up.
If real farming was as comforting as it is in Zynga's mega-hit Farmville then my family would have probably never left those harsh North Dakota winters. None of the scary bedtime stories my Grandma used to tell about farming are true in FarmVille. Farmers make money, plants grow, and animals never visit the red barn. I guess it's just that keep-your-shoes-clean back-to-the-land charm that has helped make FarmVille the "largest game in the world" in such an astonishingly short time.
How did FarmVille scale a web application to handle 75 million players a month? Fortunately FarmVille's Luke Rajlich has agreed to let us in on a few their challenges and secrets. Here's what Luke has to say...
CNBC, like many large web sites, relied on a CDN for content delivery. Recently, we started looking to see if we could improve this model. Our criteria was:
- improve response time
- have better control over traffic (real time reporting, change management and alerting)
- better utilize internal datacenters and their infrastructure
- shield users from any troubles at the origin infrastructure
- cost out
One important high availability principle is concurrency control. The idea is to allow only that much traffic through to your system which your system can handle successfully. For example: if your system is certified to handle a concurrency of 100 then the 101st request should either timeout, be asked to try later or wait until one of the previous 100 requests finish. The 101st request should not be allowed to negatively impact the experience of the other 100 users. Only the 101st request should be impacted. Read more here...
Morgan Tocker has an awesome article and comment thread in the MySQL Performance Blog about When should you store serialized objects in the database? Before the NoSQL age is was very common to simulate schemalessness by storing blobs in MySQL. Sharding was implemented by running multiple MySQL instances and spreading writes across them. While not ideal for the purpose, developers felt comfortable with MySQL. They knew how to install it, back it up, replicate it, in short: they knew how to make it work. Yet they also needed to store objects without the penalty of joins. Searches and aggregate queries were handled by indexes kept in separate tables, this offloaded the fast path to objects.
This all made perfect sense. Usually we just want stuff to work and going with what you know is often the best path to that goal. And what we have known is MySQL. All the different pros and cons of this approach are covered wonderfully in the post.
But the world has changed.
BigDataMatters is focused on the issues faced when processing and managing large amounts of data. In light of this, it would be a crime not to blog about the security of this data. Over the next few weeks, I will write a series of posts focused on identity management in the enterprise. Before you read any more, how is your identity secured?
This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud.
If datacenters are the new castles, then what will be the new gunpowder? As soon as gunpowder came on the scene, castles, which are defensive structures, quickly became the future's cold, drafty hotels. Gunpowder fueled cannon balls make short work of castle walls.
There's a long history of "gunpowder" type inventions in the tech industry. PCs took out the timeshare model. The cloud is taking out the PC model. There must be something that will take out the cloud.
Right now it's hard to believe the cloud will one day be no more. They seem so much the future, but something will transcend the cloud.
With the success of Neo4j as a graph database in the NoSQL revolution, it's interesting to see another graph database, HyperGraphDB, in the mix. Their quick blurb on HyperGraphDB says it is a: general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes.
From the NoSQL Archive the summary on HyperGraphDB is: API: Java (and Java Langs), Written in:Java, Query Method: Java or P2P, Replication: P2P, Concurrency: STM, Misc: Open-Source, Especially for AI and Semantic Web.
So it has some interesting features, like software transactional memory and P2P for data distribution, but I found that my first and most obvious question was not answered: what the heck is a hypergraph and why do I care? Buried in the tutorial was:
A HyperGraphDB database is a generalized graph of entities. The generalization is two-fold:
- Links/edges "point to" an arbitrary number of elements instead of just two as in regular graphs
- Links can be pointed to by other links as well.
OK, but I wish there was some explanation of why this is valuable. What can I do with it that I can't do with normal graphs? Given that there have been concerns over the complexity of the API this would seem a natural topic to cover. I assume it's cool, it sounds cool, but I would like to know why :-)
In any case it looks like an interesting product to take a look at. Database options are expanding fast.