« Stuff The Internet Says On Scalability For November 14th, 2014 | Main | Sponsored Post: Apple, Asana, Hypertable, Sprout Social, Scalyr, FoundationDB, AiScaler, Aerospike, AppDynamics, ManageEngine, Site24x7 »
Wednesday
Nov122014

Three Reasons You Probably Don’t Need Multi-Data Center Capabilities

This is a guest post by Nikhil Palekar, Systems Architect, FoundationDB

For many organizations that care a lot about strong consistency and low latency or haven’t already built a fault tolerant application tier on top of their database, adding a multiple data center (MDC) database implementation may create more complexity or unintended consequences than meaningful benefits. Why might that be?

Choosing a database that meets the needs of your application is a significant decision because it’s such a fundamental part of the system you’re building. Without a solid back-end that supports your application and provides meaningful guarantees, the rest of your application is limited in terms of how robust, scalable, and fault-tolerant it can be when stressed.

One way that many developers have started building fault tolerance into their systems is by distributing their database across multiple data centers. “Eventually consistent” systems, which aim to eventually make data consistent across nodes, can be implemented across data centers more easily than databases with “strong consistency” guarantees because they provide limited guarantees about the state of the data they store. However, that decision to distribute the system across multiple, physical locations is not something that should be taken lightly, as it introduces additional complexity into the entire system. If your distributed database is the main database of record for your application, having inconsistent state or lost data probably isn’t a good idea.

Many applications have not been designed with this multi-location, distributed setup in mind at the application level, so they actually don’t need MDC from the database either! There are definitely situations where MDC is helpful and desirable, but there are plenty of situations, some of which are discussed below, in which you may not need MDC capabilities at the database level.

1. Your application tier might not be scalable and fault tolerant

An application stack that fully utilizes the capabilities of an MDC database has fault tolerance built into each level of the system, from the DNS information down to the hardware running the application. At the DNS level, you would want to have multiple “A” records that point to multiple load balancers to ensure that the URL for your application can be resolved to an active load balancer IP even if one of those load balancers failed or can’t be reached. (Obviously fault tolerance at the DNS provider level is helpful too, but as the user, you can’t do too much to ensure that your DNS provider is resilient to failures.)

At the next level down, you would want to ensure that your load balancers are able to scale and handle failures as well. There are a number of ways to approach adding fault tolerance at this level, such as using an “elastic” load balancer that gets scaled up as part of what Amazon calls an “auto scaling” group, for example, which also allows it to handle failures of individual server “instances” to eliminate the load balancer tier as a single point of failure.

Next come the application servers, which receive traffic from the machines acting as load balancers. Just like the load-balancer tier, you would want at least a couple of application servers running your application to ensure that you’re able to service application requests even if one or more application instances experience failures.

Finally, you arrive at the database tier, where there’s another challenge. Even though you need to ensure that you have fault tolerance when facing individual machine failures, you also need to ensure that your database can be an actual “source of truth” for your application, regardless of the node that failed. That’s a fairly hard problem to solve, but many modern distributed databases do have the ability to automatically handle machine failures and ensure that data is durably stored and replicated between machines in the cluster so that no data is lost if a machine fails.

Once you have all of that set up, the next step would be to potentially replicate the entire setup across several data centers or availability zones of your hosting/cloud provider to insulate you from failures in a single location. So at a very high level, you need to take what you just set up and replicate it in multiple locations!

This configuration has many different moving parts and could be a challenge to manage, but it’s also far more expensive than running an application in a single data center with more basic fault tolerance. Unless you’re willing to devote resources to set up and manage this infrastructure and have customers demanding that your application not go down due to an unexpected outage on a fairly reliable platform like Amazon, the MDC capabilities of your database may not matter all that much and don’t provide better service to your customers than a single-data center solution (for your entire application stack) would.

2. Latency matters a lot

Different databases require different amounts of communication between nodes and between data centers in response to database operations. A MDC configuration of a transactional, strong consistency system requires communication between the database machines before a transaction can be committed, so you have to build in that latency cost when evaluating the system’s performance. That latency cost doesn’t have to be huge if the ping time between the data centers is fairly small, but it is an additional cost that impacts the overall performance of the system. If latency is the biggest concern and you need a MDC system, you should consider whether a potential latency savings of an eventually consistent system outweigh the limited data guarantees those systems provide. It’s worth noting that with this approach you’re essentially giving up the ability to say that the data across the multiple nodes and multiple data centers is actually consistent.

That difference in consistency is a big deal for an operational data store that’s being relied upon by the application as a “source of truth” regarding data processed. Of course, if you’re using a different source of truth for your application, you’re talking about another system that you have to worry about scaling and breaking—if that database is only running on a single system or in a single data center, now you have a single point of failure that could impact the overall availability and reliability of your entire application.

An eventually consistent database working in MDCs is a design decision that introduces risks and failure modes as a tradeoff. Whether that tradeoff is worth it depends on your preference for dealing with failures and the importance of latency.

3. "MDC" means worldwide, synchronously connected distributed data centers

Moving data between data centers across the world by itself isn’t a huge problem. However, when you want to provide a consistent data record across the entire system and provide read and write access to any of those systems concurrently, it gets much more challenging.

For example, imagine if two users who connect to geographically diverse data centers both want to update a particular piece of data in the database. Both transactions can’t succeed by talking with a local data center in total isolation from the other data centers, and any changes to the database also need to be replicated to all data centers in some way before a transaction could be considered “committed” to ensure that it’s durable and non-conflicting. So at that point, you’re paying geographic latency penalties on every commit and have the potential for slower conflict resolution because transactions processed by the system need to be reconciled before each transaction can be committed.

Commutative Replicated Data Types (CRDTs) are one way to approach these types of problems with eventually consistent systems but are not a general purpose solution for arbitrary reads and writes to a database because not all operations can be applied commutatively (and thus, don’t work with CRDTs). As mentioned above with respect to latency considerations, if the eventually-consistent model fits your application’s requirements and needs, CRDTs can provide some level of conflict resolution and consistency in place of consensus when performing updates across multiple nodes. Alternatively, if you actually just need local, read access from different data centers, an approach that uses asynchronous replication from your transactional, operational data store to distributed, read-only data centers around the globe may just fit the bill.

In summary

An MDC database deployment can provide valuable benefits to organizations and developers to deal with data center outages, machine failures, and disaster recovery. However, configuring and operating an entire application stack consistently across multiple locations is a non-trivial (and potentially expensive) task that potentially introduces more points of failure than a single-data center deployment.

Deploying the entire stack correctly in a fault-tolerant and robust way is what actually provides the most benefit to developers and operators. As with many technology decisions, there are various tradeoffs that have to be weighed, and this decision about MDC deployment is no different.

Reader Comments (1)

Very happy someone is shedding light on the realities of MDC database deployments, it is currently a topic where many people are leaping before looking.

The point that you should add MDC only after your entire app's stack is scalable and fault tolerant is dead on, DataCenter failover should be thought of as a last resort.

The point that a single datacenter deployment serves your customers equally well as a MDC deployment ignores the improvement in latencies users experience using a geographically closer datacenter (e.g. on business trips to Asia, I take a break from Facebook because its so slow it just sucks).

The point about latency mattering a lot is an understatement. For instance, if you do a MDC database deployment, need transactional strong consistency, and are using AWS regions spanning different continents, the latency can be in the seconds for each transaction (e.g. write-write conflict in two transactions: one originating in US-East, the other in Tokyo). I am interested in stories of anyone doing this in practice, it just seems too slow to be feasible for any use case w/ low latency requirements.

Another MDC database deployment trick is to shard your users so that they stick to a certain DataCenter at a given point in time. You pin UserX to DataCenterA and run transactions only effecting UserX's data in DataCenterA and no other DataCenters, this precludes conflicting transactions at other DataCenters, but it turns your RDBMS into a geo-distributed K-V store and pushes geo-sharding logic into the application (e.g. when DataCenterA goes down, you must re-pin UserX to DataCenterB in application logic).

As the author explains, this is why people often use eventually consistent (EC) datastores for MDC database deployments. But out-of-the-box EC solutions usually default to last-writer-wins (LWW) for conflict resolution, and it is up to the developer to implement correct conflict resolution logic in UDFs. Most developers skip this step because it is basically the equivalent of writing a full blown CRDT library :)

LWW is a problem for multi-master writes: LWW throws away ALL losers, which means any updates to the same data done in parallel in different data-centers are subject to being thrown away. LWW conflict resolution is incredibly hard to reason about, because every update is ACKed as successful, but there is ZERO guarantee the update will be globally applied (it may/may-not be discarded during conflict resolution).

CRDTs do solve the above mentioned problems, and there are NOSQL datastores w/ CRDT libraries (e.g. Riak has 4-5 data-types and Cassandra has counters). Other NOSQL datastores can be tweaked to support CRDTs by writing your own conflict resolution UDFs (e.g. CouchDB, Couchbase).

The point in this article that not all operations can be applied commutatively, so CRDTs are not general purpose is (IMHO) wrong. Not all operations are possible in any datastore: can you do update-joins in document-stores? can you update a RDBMS via a graph traversal in SQL? No, so you hack them in application logic, because no one tool supports all operations. A general purpose datastore means you can do a certain genre of operations reasonably efficiently. CRDTs can fully represent a multitude of data-structures (e.g. documents, graphs, queues, arrays, etc...), which should be a wide enough breadth of operations to consider them as general purpose.

It's also worth noting, reasoning about CRDTs is much easier than reasoning about generic eventually consistent systems. CRDTs are a variant of eventual consistency called Strong Eventual Consistency. SEC gives a much stronger guarantee than EC: in SEC data converges in a predictable/intuitive manner to a consistent state, whereas EC uses less sophisticated algorithms to attain consistency. The simplest illustration of their differences would be pitting a SEC-counter vs an EC-counter: if you initialize both counters to 0 and have 10 different writers incrementing by one in parallel to both counters, the SEC-counter will converge to 10, whereas the EC-counter will most likely converge to 1, but it can converge to anything between 1-10 depending on replication latencies.

Again, this is a good article on a topic that needs to be discussed. MDC database deployments are a really complicated undertaking full of pitfalls and gotchas, but progress is being made :)

November 12, 2014 | Unregistered CommenterRussell Sullivan

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>