An alternate strategy for database sharding which avoids queries across different shards and merging results. A central repository of data is maintained for some tables along with other shards. Can be used in calculating top users, recent users, most read etc.
With the introduction of Redis your options in the key-value space just grew and your choice of which to pick just got a lot harder. But when you think about it, that's not a bad position to be in at all. Redis (REmote DIctionary Server) - a key-value database. It's similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists and sets with atomic operations to push/pop elements. The key points are: open source; speed (benchmarked performing 110,000 SET operations, and 81,000 GETs, per second); persistence, but in an asynchronous way taking everything in memory; support for higher level data structures and atomic operations. The home page is well organized so I'll spare the excessive-copying-to-make-this-post-longer. For a good overview of Redis take a look at Antonio Cangiano's article: Introducing Redis: a fast key-value database. If you are looking at a way to understand how Redis is different than something like Tokyo Cabinet/Tyrant, Ezra Zygmuntowicz has done a good job explaining their different niches:
Redis has a different use case then tokyo does. I think tokyo is better for long term persistent data storage. But redis is much better as a state/data structure server. Redis is faster then tokyo by quite a bit but is not immediately durable as in the writes to disk happen in the background at certain trigger points so it is possible to lose a little bit of data if the server crashes. But the power of redis comes in its data types. Having LISTS and SETS as well as string values for keys means you can do O(1) push/pop/shift/unshift as well as indexing/slicing into lists. And with the SET data type you can do set intersection in the server. This allows for very cool thigs like storing a set of tags for each key and then querying for the intersection of the set of tags of multiple keys to find out the common set of tags. I'm using this in nanite(an agent based messaging system) for the persistent storage of agent state and routing info. Using the SET data types makes this faster then keeping the state in memory in my ruby processes since can do the SEt intersection routing inside of redis rather then iterating and comparing in ruby: http://gist.github.com/77314. You can also use redis LISTS as a queue to distribute work between multiple processes. Since pushing and popping are atomic you can use it as a shared tuple space. Also you can use a LIST as a circular log buffer by pushing to the end of a list and then doing an LTRIM to trim the list to the max size: redis.list_push_tail('logs', 'some log line..') redis.list_trim('logs', 0, 99). That will keep your circular log buffer at a max of 100 items. With tokyo you can store lists if you use lua on the server, but to push or pop from a list you have to pull the entire list down, alter it and then push it back up to the server. There is no atomic push/pop of just a single item because tokyo cannot store real lists as values so you have to do marshaling of your list to/from a string.A demo application called Retwis shows how to use Redis to make a scalable Twitter clone, at least of the message posting aspects. There's a lot of good low level detail on how Redis is used in a real application.
Evan Weaver from Twitter presented a talk on Twitter software upgrades, titled Improving running components as part of the Systems that never stop track at QCon London 2009 conference last Friday. The talk focused on several upgrades performed since last May, while Twitter was experiencing serious performance problems.
IBM WebSphere eXtreme Scale is IBMs in memory data grid product (IMDG). It can be used as a key-value store which partitions the keys (using a form of consistent hashing) over a set of servers such that each server is responsible for a subset of the keys. It automatically handles replication which can be either synchronous of asynchronous and handles advanced placement so that replicas can be placed in different physical zones when compared to the placement of the primary. Think buildings, racks, floor, data centers. It is fully elastic in that servers can be added and removed and it automatically redistributes the partition primaries and backups. It can be scaled from one server to hundreds if not thousands of JVMs in a single grid. Each additional server provides more CPU, memory capacity and network and it scales linearly with grid growth. It also has a key-graph mode where a graph of objects can be associated with a single key and it allows fine grained modification of that graph. The object graph and key is stored in tuple form in this mode. This allows clients using different object representations of some subset of the IMDG schema to share data stored in the IMDG. It comes with automatic integration with databases so that values are automatically pulled from a database if not present and are written to the database when they change. Write behind logic allows writes to the database to be much more efficient and allows the grid to run with the database down. It comes with a HTTP Session filter to provide HTTP Session management for servlet containers. It have a flexible deployment model allowing a lot of customization by customers. We do a weekly video podcast on iTunes (search for extreme scale in iTunes) and make it available on YouTube also for customer education. We answer customer questions and forum topics from the week in a casual two person chat forum.
One of the key items Sun will be talking about in today's cloud computing announcement (at 9AM EST/6AM PST) will be Sun's opening of the APIs that we'll use for the Sun Cloud. We're making these available so that those who are interested will be able to review and comment on these APIs. Continuing our commitment to openness, we're making these APIs available via the Creative Commons Version 3.0 license. ...
I am excited about the upcoming release of two books on Web 2.0 and Cloud Application Architectures by O'Reilly. Web 2.0 Architectures (estimated release in May 2009) What entrepreneurs and information architects need to know Using several high-profile Web 2.0 companies as examples, authors Duane Nickull, Dion Hinchcliffe, and James Governor have distilled the core patterns of Web 2.0 coupled with an abstract model and reference architecture. The result is a base of knowledge that developers, business people, futurists, and entrepreneurs can understand and use as a source of ideas and inspiration. Featured architectures include Google, Flickr, BitTorrent, MySpace, Facebook, and Wikipedia. Cloud Application Architectures (estimated release in April 2009) Building Applications and Infrastructure in the Cloud This book by George Reese offers tested techniques for creating web applications on cloud computing infrastructures and for migrating existing systems to these environments. Specifically, you'll learn about the programming and system administration necessary for supporting transactional web applications in the cloud -- mission-critical activities that include orders and payments to support customers. The second book is available online at O'Reilly as a Rough Cuts Version so you might already had a chance to check it out. If so, do you like it?
A recent InfoWorld article claims that "With Cisco expected to enter the blade market and Sun expected to offer networking equipment, things could get interesting awfully fast." How does this effect your infrastructure strategy and decisions? Would you consider to build scalable web applications on the Cisco Unified Computing System? Or would you consider to build a router out of a server with the use of OpenSolaris and Project Crossbow as the article suggests? Will any of these initiatives change the way we build scalable web infrastructure or are these just attempts to sale these systems? What do you think?
We are on the edge of two potent technological changes: Clouds and Memory Based Architectures. This evolution will rip open a chasm where new players can enter and prosper. Google is the master of disk. You can't beat them at a game they perfected. Disk based databases like SimpleDB and BigTable are complicated beasts, typical last gasp products of any aging technology before a change. The next era is the age of Memory and Cloud which will allow for new players to succeed. The tipping point will be soon.
Let's take a short trip down web architecture lane:
You may disagree with the timing of various innovations and you would be correct. I couldn't find a history of the evolution of website architectures, so I just made stuff up. If you have any better information please let me know.
Why might cloud based memory architectures be the next big thing? For now we'll just address the memory based architecture part of the question, the cloud component is covered a little later.
Behold the power of keeping data in memory:
Google query results are now served in under an astonishingly fast 200ms, down from 1000ms in the olden days. The vast majority of this great performance improvement is due to holding indexes completely in memory. Thousands of machines process each query in order to make search results appear nearly instantaneously.This text was adapted from notes on Google Fellow Jeff Dean keynote speech at WSDM 2009.
Google isn't the only one getting a performance bang from moving data into memory. Both LinkedIn and Digg keep the graph of their network social network in memory. Facebook has northwards of 800 memcached servers creating a reservoir of 28 terabytes of memory enabling a 99% cache hit rate. Even little guys can handle 100s of millions of events per day by using memory instead of disk.
With their new Unified Computing strategy Cisco is also entering the memory game. Their new machines "will be focusing on networking and memory" with servers crammed with 384 GB of RAM, fast processors, and blazingly fast processor interconnects. Just what you need when creating memory based systems.
Memory is the System of RecordWhat makes Memory Based Architectures different from traditional architectures is that memory is the system of record. Typically disk based databases have been the system of record. Disk has been King, safely storing data away within its castle walls. Disk being slow we've ended up wrapping disks in complicated caching and distributed file systems to make them perform.
Sure, memory is used as all over the place as cache, but we're always supposed to pretend that cache can be invalidated at any time and old Mr. Reliable, the database, will step in and provide the correct values. In Memory Based Architectures memory is where the "official" data values are stored.
Caching also serves a different purpose. The purpose behind cache based architectures is to minimize the data bottleneck through to disk. Memory based architectures can address the entire end-to-end application stack. Data in memory can be of higher reliability and availability than traditional architectures.
Memory Based Architectures initially developed out of the need in some applications spaces for very low latencies. The dramatic drop of RAM prices along with the ability of servers to handle larger and larger amounts of RAM has caused memory architectures to verge on going mainstream. For example, someone recently calculated that 1TB of RAM across 40 servers at 24 GB per server would cost an additional $40,000. Which is really quite affordable given the cost of the servers. Projecting out, 1U and 2U rack-mounted servers will soon support a terabyte or more or memory.
RAM = High Bandwidth and Low LatencyWhy are Memory Based Architectures so attractive? Compared to disk RAM is a high bandwidth and low latency storage medium. Depending on who you ask the bandwidth of RAM is 5 GB/s. The bandwidth of disk is about 100 MB/s. RAM bandwidth is many hundreds of times faster. RAM wins. Modern hard drives have latencies under 13 milliseconds. When many applications are queued for disk reads latencies can easily be in the many second range. Memory latency is in the 5 nanosecond range. Memory latency is 2,000 times faster. RAM wins again.
RAM is the New DiskThe superiority of RAM is at the heart of the RAM is the New Disk paradigm. As an architecture it combines the holy quadrinity of computing:
Access disk on the critical path of any transaction limits both throughput and latency. Committing a transaction over the network in-memory is faster than writing through to disk. Reading data from memory is also faster than reading data from disk. So the idea is to skip disk, except perhaps as an asynchronous write-behind option, archival storage, and for large files.
Or is Disk is the the new RAMTo be fair there is also a Disk is the the new RAM, RAM is the New Cache paradigm too. This somewhat counter intuitive notion is that a cluster of about 50 disks has the same bandwidth of RAM, so the bandwidth problem is taken care of by adding more disks.
The latency problem is handled by reorganizing data structures and low level algorithms. It's as simple as avoiding piecemeal reads and organizing algorithms around moving data to and from memory in very large batches and writing highly parallelized programs. While I have no doubt this approach can be made to work by very clever people in many domains, a large chunk of applications are more time in the random access domain space for which RAM based architectures are a better fit.
Grids and a Few Other DefinitionsThere's a constellation of different concepts centered around Memory Based Architectures that we'll need to understand before we can understand the different products in this space. They include:
Who are the Major Players in this Space?With that bit of background behind us, there are several major players in this space (in alphabetical order):
This class of products has generally been called In-Memory Data Grids (IDMG), though not all the products fit snugly in this category. There's quite a range of different features amongst the different products.
I tossed IDMG the acronym in favor of Memory Based Architectures because the "in-memory" part seems redundant, the grid part has given way to the cloud, the "data" part really can include both data and code. And there are other architectures that will exploit memory yet won't be classic IDMG. So I just used Memory Based Architecture as that's the part that counts.
Given the wide differences between the products there's no canonical architecture. As an example here's a diagram of how GigaSpaces In-Memory-Data-Grid on the Cloud works.
Some key points to note are:
Not conceptually difficult and familiar to anyone who has used caching systems like memcached. Only is this case memory is not just a cache, it's the system of record.
Obviously there are a million more juicy details at play, but that's the gist of it. Admittedly GigaSpaces is on the full featured side of the product equation, but from a memory based architecture perspective the ideas should generalize. When you shard a database, for example, you generally lose the ability to execute queries, you have to do all the assembly yourself. By using GigaSpaces framework you get a lot of very high-end features like parallel query processing for free.
The power of this approach certainly comes in part from familiar concepts like partitioning. But the speed of memory versus disk also allows entire new levels of performance and reliability in a relatively simple and easy to understand and deploy package.
NimbusDB - the Database in the CloudJim Starkey, President of NimbusDB, is not following the IDMG gang's lead. He's taking a completely fresh approach based on thinking of the cloud as a new platform unto itself. Starting from scratch, what would a database for the cloud look like?
Jim is in position to answer this question as he has created a transactional database engine for MySQL named Falcon and added multi-versioning support to InterBase, the first relational database to feature MVCC (Multiversion Concurrency Control).
What defines the cloud as a platform? Here's are some thoughts from Jim I copied out of the Cloud Computing group. You'll notice I've quoted Jim way way too much. I did that because Jim is an insightful guy, he has a lot of interesting things to say, and I think he has a different spin on the future of databases in the cloud than anyone else I've read. He also has the advantage of course of not having a shipping product, but we shall see.
is a different way of thinking, a different topology, a different way of putting the parts together.
cache. Yes, disks are still there, but they're out of the performance loop except for data so stale that nobody has it memory.
single computer. Bah, I say, bah!
rather than a feature. If a database system had sufficient capacity, reliability, and availability, nobody would ever partition or shard data. (If one database instance is a headache, a million tiny ones is a horrible, horrible migraine.)
boring business model. When clouds evolve to point that applications and databases can utilize whatever resources then need to meet demand without the constraint of single machine limitations, we'll have something really neat.
will have different views. Certain critical operations must be serialized, but that still doesn't require that all nodes have identical views of database state.
Jim certainly isn't shy with his opinions :-)
My summary of what he wants to do with NimbusDB is:
I'm not sure if NimbusDB will support a compute grid and map-reduce type functionality. The low latency argument for data and code collocation is a good one, so I hope it integrates some sort of extension mechanism.
Why might NimbusDB be a good idea?
The smart money has been that cloud level scaling requires abandoning relational databases and distributed transactions. That's why we've seen an epidemic of key-value databases and eventually consistent semantics. It will be fascinating to see if Jim's combination of Cloud + Memory + MVCC can prove the insiders wrong.
Are Cloud Based Memory Architectures the Next Big Thing?We've gone through a couple of different approaches to deploying Memory Based Architectures. So are they the next big thing?
Adoption has been slow because it's new and different and that inertia takes a while to overcome. Historically tools haven't made it easy for early adopters to make the big switch, but that is changing with easier to deploy cloud based systems. And current architectures, with a lot of elbow grease, have generally been good enough.
But we are seeing a wide convergence on caching as way to make slow disks perform. Truly enormous amounts of effort are going into adding cache and then trying to keep the database and applications all in-sync with cache as bottom up and top down driven changes flow through the system.
After all that work it's a simple step to wonder why that extra layer is needed when the data could have just as well be kept in memory from the start. Now add the ease of cloud deployments and the ease of creating scalable, low latency applications that are still easy to program, manage, and deploy. Building multiple complicated layers of application code just to make the disk happy will make less and less sense over time.
We are on the edge of two potent technological changes: Clouds and Memory Based Architectures. This evolution will rip open a chasm where new players can enter and prosper. Google is the master of disk. You can't beat them at a game they perfected. Disk based databases like SimpleDB and BigTable are complicated beasts, typical last gasp products of any aging technology before a change. The next era is the age of Memory and Cloud which will allow for new players to succeed. The tipping point is soon.
We have added quite a few features specifically tailored to high scalability and high performance environments to our tool over the years. This includes the ability to log to memory and dump log files on demand (when a crash occurs for example), special backlog queue features, a log service application for central log storage and a lot more. Additionally, our SmartInspect Console (the viewer application) makes viewing, filtering and inspecting large amounts of logging data a lot easier/practical.
Optimus Cloud™ by Prima Grid (www.primagrid.com) provides development and distributed runtime environment designed to deliver internet-scale Software as a Service innovation. Optimus Cloud™ Services Collaboration, deliver innovations with shorter-time-to-market and helps Cloud-based service developers to boost reuse and productivity. Optimus Cloud™ Technology is a unique implementation of server-less, peer-to-peer, grid-based, cloud-ready service runtime that coordinates service components according to user-defined SLAs. Optimus Cloud™ is built for highly distributed heterogeneous environments and delivers none-trivial Qualities-of-Service. With its self-organizing and fault-tolerance capabilities Optimus Cloud™ improves price for performance, flexibility, time-to-market, robustness, application level elasticity, on demand scalability, capacity utilization and asset utilization. Distributed service creation platform: Optimus Cloud™ enables Service Developers to collaborate capabilities and services from distributed heterogeneous sources, assemble and easily mashup capabilities into new innovations. Capabilities can be legacy in house investments, third party cloud services, SOA based Services, APIs, Packaged applications etc. Optimus Cloud™ Delivers services innovation faster, reduces barriers to entry and risks dramatically lowering initial investments. Distributed service delivery platform: Optimus Cloud™ provides a distributed grid computing runtime environment for Internet-scale cloud-ready-applications over clouds with the following benefits and capabilities: Maximize asset utilization: Optimus Cloud™ built from the ground up based on Grid Technology, maximizes asset utilization of Clouds and multi-site data-center. Optimus Cloud™ allocates computing power on-demand and automatically adjusts application scale according to the changing demands. Thus, reduces Total Cost of Ownership (TCO) and maximizes profits. Failure ready: Optimus Cloud™ provides a failure ready environment for hosting highly distributed cloud-ready application components reducing the need for immediate administrative action thus the service scales and adjusts cost-effectively and reliably. Improves productivity and shortens time-to-market: Optimus Cloud™ improves productivity building Cloud-Based applications, enables developers to focus on delivering innovations while collaborating and assembling Cloud-Based Service capabilities. Multi-tenancy: Optimus Cloud™ assures secured access and control of services and information as profile and assets are partitioned and separated from other tenants. Optimus Cloud™ technology implements Multi-tenancy and can host multiple and separated Virtual Organizations. On-Demand Scalability: Optimus Cloud technology distributes workloads for improved utilization of computing power and scale application components on-demand based on virtual organizations policy and SLA. Optimus Cloud™ is a server-less Grid, thus has no single point or performance bottlenecks. Making Optimus Cloud™ a premium choice for massive scale application. Delivers none trivial qualities: Optimus Cloud™ features none-trivial-qualities-of-service in terms of performance, availability, throughput and sustainability as Optimus Cloud™infrastructure services continuously optimize service to meet with SLA terms. Commodity hardware: Optimus Cloud™ is a multi-cloud platform grid, and may be run on top of multiple utility and cloud computing service providers in order to accommodate Service requirements, scale and resilience. Optimus Cloud Technology is optimal for resource utilization over multi-site multi-cloud services enabling services and applications to the Cloud.