Using PGBouncer to improve PHP applications running in the cloud · Blog

PHP is request-based, meaning that all resources are closed at the end of each request. This is great in preventing memory leaks and other side effects between requests. But aparently not great with cloud databases, when we have to reconnect for each request.

Enter PGBouncer, a connection pooler for Postgres. PGBouncer is a transparent proxy, the application talks to it as it would talk to a "real" Postgres database. PGBouncer forwards the queries and replies with the response from the upstream Postgres.
PGBouncer runs permanently, keeping connections to the database open and does connection pooling so that several PHP processes can share one database connection.

We set up PGBouncer to run in the same Docker container as the PHP application, configured it to talk to the actual Postgres database and changed our application to talk to PGBouncer. This means the overhead of constantly opening and closing connections is handled inside the same viurtal machine rather than going through the network infrastructure at the cloud provider.

This helps to scale our application significantly and we did not see any connection failures even under very high load.

Setting up PGBouncer with s6-overlay

Lets look in detail into how we actually set up PGBouncer. For the project, we use s6-overlay, a setup to run the S6 process management in Docker containers. Docker purists will hate it, but for us it was much easier to manage utility services and workers inside the same image, rather than deploy several separate applications for them. The projects' Docker image is built on top of Docker Webstack which comes with S6 fully set up and ready to use.

In our Docker file, we install PGBouncer and configure it. We also change the /init script to rewrite the database connection string environment variable so that the application connects to PGBouncer.

Our configuration file is

; docker/etc/pgbouncer/pgbouncer.ini
[pgbouncer]
pool_mode = session
listen_addr = localhost
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

; it would be neat to use a unix socket, but Symfony does not support postgres DATABASE_URL with a socket
; if it ever supports it, set listen_addr to empty to not listen on any network port
; unix_socket_dir = /run/pgbouncer
unix_socket_dir =

%include /etc/pgbouncer/pgbouncer-databases.ini

We also create placeholder files for the referenced files that we will generate at startup

; docker/etc/pgbouncer/pgbouncer-databases.ini
; This file will be overwritten at container startup with the database settings from the DATABASE_URL ENV variable
[databases]

; docker/etc/pgbouncer/userlist.txt
; This file will be overwritten at container startup with the credential settings from the DATABASE_URL ENV variable

And additionally, we provide the S6 startup file docker/services/pgbouncer/run:

#!/usr/bin/with-contenv sh

# Start pgbouncer:
# -u : Switch to the given user on startup
exec /usr/bin/pgbouncer -u php /etc/pgbouncer/pgbouncer.ini 2>&1

and a type file at docker/services/pgbouncer/type with just one word longrun to tell S6 to keep PGBouncer running.

Then we load all these files in the Dockerfile:


# Install and configure PGBouncer
# see: https://www.pgbouncer.org/config.html
RUN set -eux; \
    apk add --no-cache pgbouncer; \
    # Do not allow other users to read/write the pgbouncer configurations
    chmod go-rw /etc/pgbouncer/*;

COPY --link docker/etc/pgbouncer /etc/pgbouncer
COPY --link docker/services/ /etc/s6-overlay/s6-rc.d/

Transparently Configuring PGBouncer

Symfony is configured to take its configuration from the environment variable DATABASE_URL. The connection string contains all parts of the configuration, it typically looks like this: postgresql://app:secret!password@database:5432/app?serverVersion=16&charset=utf8.

We wanted to keep the configuration simple to avoid surprises when changing configuration. The way we solved this was by adding code to the init script of the Docker container. The init script now creates the correct configuration files for PGBouncer on the fly from the environment variables. It also replace the DATABASE_URL variable with one pointing to PGBouncer on localhost, backing up the original variable as DIRECT_DATABASE_URL to help with debugging.

Note that the port is normally omitted when Postgres listens on the default port 5432. But the parsing was complicated enough (yep, shell script is not a great language to write code in), so we decided to specify the port explicitly. The script docker/init-pgbouncer looks like this:

#!/bin/sh -e

# parse DATABASE_URL and rewrite to use pgbouncer, adapted from
# https://stackoverflow.com/a/17287984/146289

# verify expected protocol
case "$DATABASE_URL" in
    postgresql://*) echo "Rewriting DATABASE_URL";;
    *) echo "DATABASE_URL must start with postgresql://" && exit 1;;
esac

# remove the protocol
url=$(echo "$DATABASE_URL" | sed -e s,postgresql://,,g)

# extract the user and password (if any)
userpass="$(echo "$url" | grep @ | cut -d@ -f1)"
pass=$(echo "$userpass" | grep : | cut -d: -f2)
if [ -n "$pass" ]; then
    user=$(echo "$userpass" | grep : | cut -d: -f1)
else
    echo "Database connection must have username and password"
    exit 1
fi

# extract the host -- updated
hostport=$(echo "$url" | sed -e s,"$userpass"@,,g | cut -d/ -f1)
port=$(echo "$hostport" | grep : | cut -d: -f2)
if [ -n "$port" ]; then
    host=$(echo "$hostport" | grep : | cut -d: -f1)
else
    echo "Database connection must have host and port"
    exit 1
fi

# extract the path (if any)
pathoptions="$(echo "$url" | grep / | cut -d/ -f2-)"
options="$(echo "$pathoptions" | grep '?' | cut -d? -f2-)"
if [ -n "$options" ]; then
    path=$(echo "$pathoptions" | grep '?' | cut -d? -f1)
else
    path=$pathoptions
fi

printf "[databases]\n%s = host=%s port=%s dbname=%s\n" "$path" "$host" "$port" "$path" > /etc/pgbouncer/pgbouncer-databases.ini
echo "\"$user\" \"$pass\"" > /etc/pgbouncer/userlist.txt

export DIRECT_DATABASE_URL="$DATABASE_URL"
export DATABASE_URL="postgresql://$user:$pass@localhost:6432/$pathoptions"

And finally, we load this additional startup script into the docker image with the following Dockerfile segment:

COPY --link docker/init-pgbouncer /init-pgbouncer
RUN tail -n +2 /init >> /init-pgbouncer && mv /init-pgbouncer /init

With version 1 of S6 overlay, we simply changed the Docker entry point to our custom script and called exec /init at the end of that script. But since version 2, S6 must run as process with id 1 or it will error with s6-overlay-suexec: fatal: can only run as pid 1, hence we have to concat the init script to have one single file.

Conclusion

We now transparently use PGBouncer, moving the overhead of connecting and deconnecting to the database into the Docker container. This works much better to our prior setup when each PHP request had to create a new database connection.

Depending on your stack, you could also try to use persistent PDO connections in PHP, but that comes with its own can of worms. PGBouncer seems to protect us from the drawbacks of persistent PDO connections, and the overhead is extremly small.

Do you have a question, a comment, or just feeling inspired? Mention us or share this article on Mastodon or LinkedIn.