More concurrency: Improved locking in PostgreSQL

If you want to build a large scale website, scaling out the webserver  is not enough. It is also necessary to cleverly manage the database  side. a key to high scalability is locking.

In PostgreSQL we got a couple of new cool features to reduce locking and to speed up things due to improved concurrency.

General recommendations: Before attacking locking, however, it makes  sense to check what is really going on on your PostgreSQL database  server. To do so I recommend to take a look at pg_stat_statements and to  carefully track down bottlenecks. Here is how it works:

Improving SELECT FOR UPDATE

Imagine two people are trying to modify the same row at the same  time. Each user will first SELECT the row to inspect its content and  then start updating. The nasty thing is: Both users might see the old  row and overwrite each other's changes. This is a classical race  condition.

In real life this can have nasty consequences: Two people might book  the same flight on an aircraft or people might draw more money from  their accounts than they actually have in their account. This is far  from desirable.

Let us return to the airline example and assume that somebody wants to book a seat on an airliner:

SELECT ...

FROM table

WHERE class = 'economy'

AND empty = true

LIMIT 1

FOR UPDATE

The trouble now is: If somebody else also tries to grab a seat, he  will find the seat selected by the first person. But: This row is  blocked. The SELECT FOR UPDATE of the second guy has to wait until the  first guy has finished his transaction. Remember, the passanger might be  happy with any seat on the airliner so there is no point waiting for a  specific line.

PostgreSQL 9.5 comes to the rescue. There is a new way to obtain rows:

SELECT ...

FROM table

WHERE class = 'economy'

AND empty = true

LIMIT 1

FOR UPDATE SKIP LOCKED

The beauty here is that PostgreSQL will simply ignore locked rows and  return one, which is not blocked by anybody. This makes sense because  100 users checking for a free seat concurrently will get 100 different  rows. The consequence is that you are not stuck with 1 CPU but you can  nicely scale out to all CPUs in the system. As conflicts cannot happen  anymore, nobody has to wait on somebody else.

SELECT FOR SHARE

There is one more thing, which can make PostgreSQL provide you with more concurrency. Consider the following example:

SELECT *

FROM account AS a, currency AS c

WHERE a.currency = c.id

AND a.account = 4711

FOR UPDATE

In this case somebody wants to check his bank account. The main  question now is: Which rows are locked? The answer is: The account AND  the currency. Blocking an entire currency just because one person wants  to draw money from the ATM is clearly not a good idea. Many people  should be able to draw money at the same time. But: At this point  PostgreSQL has idea which one of those two tables you want to update.

The solution is simple:

FOR UPDATE OF account FOR SHARE OF currency

By telling PostgreSQL what we are planning to do, the database can  use a harmless lock on the currency table. Many people can therefore  take a look at the same currency at the same time without blocking each  other while the account table is still safely protected.

Concurrency is everything

Keep in mind: If you only got a single CPU, concurrency will be a  problem. Therefore is essential to do things in a way that many CPUs can  do their share of work at the very same time.