More concurrency: Improved locking in PostgreSQL
If you want to build a large scale website, scaling out the webserver is not enough. It is also necessary to cleverly manage the database side. a key to high scalability is locking.
In PostgreSQL we got a couple of new cool features to reduce locking and to speed up things due to improved concurrency.
General recommendations: Before attacking locking, however, it makes sense to check what is really going on on your PostgreSQL database server. To do so I recommend to take a look at pg_stat_statements and to carefully track down bottlenecks. Here is how it works:
Improving SELECT FOR UPDATE
Imagine two people are trying to modify the same row at the same time. Each user will first SELECT the row to inspect its content and then start updating. The nasty thing is: Both users might see the old row and overwrite each other's changes. This is a classical race condition.
In real life this can have nasty consequences: Two people might book the same flight on an aircraft or people might draw more money from their accounts than they actually have in their account. This is far from desirable.
Let us return to the airline example and assume that somebody wants to book a seat on an airliner:
SELECT ...
FROM table
WHERE class = 'economy'
AND empty = true
LIMIT 1
FOR UPDATE
The trouble now is: If somebody else also tries to grab a seat, he will find the seat selected by the first person. But: This row is blocked. The SELECT FOR UPDATE of the second guy has to wait until the first guy has finished his transaction. Remember, the passanger might be happy with any seat on the airliner so there is no point waiting for a specific line.
PostgreSQL 9.5 comes to the rescue. There is a new way to obtain rows:
SELECT ...
FROM table
WHERE class = 'economy'
AND empty = true
LIMIT 1
FOR UPDATE SKIP LOCKED
The beauty here is that PostgreSQL will simply ignore locked rows and return one, which is not blocked by anybody. This makes sense because 100 users checking for a free seat concurrently will get 100 different rows. The consequence is that you are not stuck with 1 CPU but you can nicely scale out to all CPUs in the system. As conflicts cannot happen anymore, nobody has to wait on somebody else.
SELECT FOR SHARE
There is one more thing, which can make PostgreSQL provide you with more concurrency. Consider the following example:
SELECT *
FROM account AS a, currency AS c
WHERE a.currency = c.id
AND a.account = 4711
FOR UPDATE
In this case somebody wants to check his bank account. The main question now is: Which rows are locked? The answer is: The account AND the currency. Blocking an entire currency just because one person wants to draw money from the ATM is clearly not a good idea. Many people should be able to draw money at the same time. But: At this point PostgreSQL has idea which one of those two tables you want to update.
The solution is simple:
FOR UPDATE OF account FOR SHARE OF currency
By telling PostgreSQL what we are planning to do, the database can use a harmless lock on the currency table. Many people can therefore take a look at the same currency at the same time without blocking each other while the account table is still safely protected.
Concurrency is everything
Keep in mind: If you only got a single CPU, concurrency will be a problem. Therefore is essential to do things in a way that many CPUs can do their share of work at the very same time.