UG9zdGdyZVNRTCBDb25uZWN0aW9uIFBvb2xpbmc6IFBhcnQgMSDigJMgUHJvcyAmIENvbnM=
![PostgreSQL Connection Pooling: Part 1 – Pros & Cons](https://scalegrid.io/blog/wp-content/uploads/2019/09/PostgreSQL-Connection-Pooling-Part-1-Pros-and-Cons.jpg?__SQUARESPACE_CACHEVERSION=1571332279615)
A long time ago, in a galaxy far far away, ‘threads’ were a programming novelty rarely used and seldom trusted. In that environment, the first PostgreSQL developers decided forking a process for each connection to the database is the safest choice. It would be a shame if your database crashed, after all.
Since then, a lot of water has flown under that bridge, but the PostgreSQL community has stuck by their original decision. It is difficult to fault their argument – as it’s absolutely true that:
- Each client having its own process prevents a poorly behaving client from crashing the entire database.
- On modern Linux systems, the difference in overhead between forking a process and creating a thread is much lesser than it used to be.
- Moving to a multithreaded architecture will require extensive rewrites.
However, in modern web applications, clients tend to open a lot of connections. Developers are often strongly discouraged from holding a database connection while other operations take place. “Open a connection as late as possible, close a connection as soon as possible”. But that causes a problem with PostgreSQL’s architecture – forking a process becomes expensive when transactions are very short, as the common wisdom dictates they should be. In this post, we cover the pros and cons of PostgreSQL connection pooling.
![PostgreSQL Architecture Diagram](https://scalegrid.io/blog/wp-content/uploads/2019/09/postgres_architecture.png)
The PostgreSQL Architecture | Source
The Connection Pool Architecture
Using a modern language library does reduce the problem somewhat – connection pooling is an essential feature of most popular database-access libraries. It ensures ‘closed’ connections are not really closed, but returned to a pool, and ‘opening’ a new connection returns the same ‘physical connection’ back, reducing the actual forking on the PostgreSQL side.
![Visual Representation of a Connection Pool](https://scalegrid.io/blog/wp-content/uploads/2019/09/connection-pool-architecture.png)
The architecture of a generic connection-pool
However, modern web applications are rarely monolithic, and often use multiple languages and technologies. Using a connection pool in each module is hardly efficient:
- Even with a relatively small number of modules, and a small pool size in each, you end up with a lot of server processes. Context-switching between them is costly.
- The pooling support varies widely between libraries and languages – one badly behaving pool can consume all resources and leave the database inaccessible by other modules.
- There is no centralized control – you cannot use measures like client-specific access limits.
As a result, popular middlewares have been developed for PostgreSQL. These sit between the database and the clients, sometimes on a seperate server (physical or virtual) and sometimes on the same box, and create a pool that clients can connect to. These middleware are:
- Optimized for PostgreSQL and its rather unique architecture amongst modern DBMSes.
- Provide centralized access control for diverse clients.
- Allow you to reap the same rewards as client-side pools, and then some more (we will discuss these more in more detail in our next posts)!
PostgreSQL Connection Pooler Cons
A connection pooler is an almost indispensable part of a production-ready PostgreSQL setup. While there is plenty of well-documented benefits to using a connection pooler, there are some arguments to be made against using one:
- Introducing a middleware in the communication inevitably introduces some latency. However, when located on the same host, and factoring in the overhead of forking a connection, this is negligible in practice as we will see in the next section.
- A middleware becomes a single point of failure. Using a cluster at this level can resolve this issue, but that introduces added complexity to the architecture.
![Redundant pgBouncer instances to prevent single point of failure](https://scalegrid.io/blog/wp-content/uploads/2019/09/redundancy-in-middleware.png)
Redundancy in middleware to avoid Single-Point-of-Failure | Source
- A middleware implies extra costs. You either need an extra server (or 3), or your database server(s) must have enough resources to support a connection pooler, in addition to PostgreSQL.
- Sharing connections between different modules can become a security vulnerability. It is very important that we configure pgPool or PgBouncer to clean connections before they are returned to the pool.
- The authentication shifts from the DBMS to the connection pooler. This may not always be acceptable.
![PgBouncer Authentication Model](https://scalegrid.io/blog/wp-content/uploads/2019/09/PgBouncer-Authentication-Model.png)
PgBouncer Authentication Model | Source
- It increases the surface area for attack, unless access to the underlying database is locked down to allow access only via the connection pooler.
- It creates yet another component that must be maintained, fine tuned for your workload, security patched often, and upgraded as required.
Should You Use a PostgreSQL Connection Pooler?
However, all of these problems are well-discussed in the PostgreSQL community, and mitigation strategies ensure the pros of a connection pooler far exceed their cons. Our tests show that even a small number of clients can significantly benefit from using a connection pooler. They are well worth the added configuration and maintenance effort.
In the next post, we will discuss one of the most popular connection poolers in the PostgreSQL world – PgBouncer, followed by Pgpool-II, and lastly a performance test comparison of these two PostgreSQL connection poolers in our final post of the series.