Skype Plans for PostgreSQL to Scale to 1 Billion Users
Skype uses PostgreSQL as their backend database. PostgreSQL doesn't get enough run in the database world so I was excited to see how PostgreSQL is used "as the main DB for most of [Skype's] business needs." Their approach is to use a traditional stored procedure interface for accessing data and on top of that layer proxy servers which hash SQL requests to a set of database servers that actually carry out queries. The result is a horizontally partitioned system that they think will scale to handle 1 billion users.
- Database queries are routed by a proxy across a set of database servers. The proxy creates partitions based on a field value, typically a primary key.
- For example, you could partition users across a cluster by hashing based on user name. Each user is slotted into a shard based on the hash.
- Remote database calls are executed using a new PostgreSQL database language called plproxy. An example from Kristo Kaiv's blog:
First, code to insert a user in a database:
CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS text AS $$
BEGIN
PERFORM 1 FROM users WHERE username = i_username;
IF NOT FOUND THEN
INSERT INTO users (username) VALUES (i_username);
RETURN 'user created';
ELSE
RETURN 'user already exists';
END IF;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
Heres the proxy code to distribute the user insert to the correct partition:
queries=#
CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS TEXT AS $$
CLUSTER 'queries'; RUN ON hashtext(i_username);
$$ LANGUAGE plproxy;
Your SQL query looks normal:
SELECT insert_user("username");
- The result of a query is exactly that same as if was executed on the remote database.
- Currently they can route 1000-2000 requests/sec on Dual Opteron servers to a 16 parition cluster.
- PL/Proxy servers form a scalable and uniform "DB-bus." Proxies are robust because in a redundant configuration if one fails you can just connect to another. And if the proxy tier becomes slow you can add more proxies and load balance between them.
- More partitions can be added to improve performance.
- Only data on a failed partition is unavailable during a failover. All other partitions operate normally.
- Initially Slony1 was used to move data to the other systems, but "as the complexity and loads grew Slony1 started to cause us greater and greater pains."
- To solve this problem Skype developed their on lighter weight queueing and replication toolkit called SkyTools.
The proxy approach is interesting and is an architecture we haven't seen previously. Its power comes from the make another level of indirection school of problem solving, which has advantages:
The downsides are:
It's easy to see how the advantages can outweigh the disadvantages. Without changing your application you can slip in a proxy layer and get a lot of very cool features for what seems like a low cost. If you are a MySQL user and this approach interests you then take a look at MySQL Proxy, which accomplishes something similar in a different sort of way.