A pg_basebackup Wish List

pg_basebackup was introduced in Postgres 9.1 as a simple way to copy the data directory of a running database cluster. This was a great addition to a small group of PostgreSQL Client Applications.

The pg_basebackup approach differs from the standard pg_start_backup(), rsync (or other file system copy), pg_stop_backup() approach in that it uses the replication protocol over a standard Postgres connection to make the base backup. A few highlights:

  • A completely standalone backup can be created easily by using the --xlog-method argument (the stream option here is particularly nice so that you don’t have to muck with the wal_keep_segments Postgres setting).
  • Thanks to the cascading replication machinery in 9.2 pg_basebackup can take a backup from an active hot standby server (where the basic pg_start_backup() approach fails).
  • As of 9.3 pg_basebackup will even create a recovery.conf file for you (with the --write-recovery-conf switch), so it’s basically a one-liner to get a standby with replication set up!

There’s a lot to like, but I’m greedy....

Wish #1: Parallelization

If “idle hands are the devil’s playthings” I can only imagine what the devil would do with idle CPUs and network bandwidth. It would be great if pg_basebackup were able to parallelize creation of the base backup and make use of more available resources. In the past I’ve written a script to parallelize rsync by first processing its --itemize-changes output and then spawning multiple rsync processes in parallel to perform the copies; this parallelization can really cut down on the time to get a base backup copied. There are times when getting something done as fast as possible trumps other concerns.

Wish #2: Compression Options

Sometimes network bandwidth is in short supply. pg_basebackup has a --gzip switch to enable compression, but this compression is actually done on the client machine rather than the server. Certainly there is the option to run pg_basebackup from the originating server and use netcat (or scp, etc.) to get the data copied to the destination if we are using pg_basebackup’s --format=tar (-Ft) option. For example, on the originating server:
nierman@source:~$ pg_basebackup -D - -Ft -x | pbzip2 -p8 | nc -l 1201
(in an actual use case I'd also insert pv in the pipeline to get nice status/progress updates at various stages), and on the destination server:
nierman@destination:~$ nc source 1201 > newbasedir/base.tar.bz2
This process is a bit more complicated than a single command from the client/destination server, but more importantly we lose out on things like streaming WAL files when using the tar format with pg_basebackup. Note that I’ve also used pbzip2 above (a parallelized version of bzip2). It would be great to have the option to use external programs like pbzip2, pigz, etc. with pg_basebackup compression. Let’s call that Wish #2b.

Wish #3: Don’t Die on Me Now!

Here’s some output from a recent pg_basebackup session:
pg_basebackup: starting background WAL receiver
2052690927/2052718105 kB (100%), 1/1 tablespace
pg_basebackup: could not get transaction log end position from server:
FATAL:  could not open file "./#postgresql.conf#": Permission denied
pb_basebackup chunked through almost 2TB of data and then it died just before finishing. Ack. Why? At some point someone had edited the postgresql.conf file as root and their editor saved the backup copy seen above (in the postgres data directory of course!); the Postgres backend couldn’t read that file (owned by root) when creating the base backup. I would have loved to see a simple error message and have pg_basebackup continue on its merry way. Certainly an initial ls -l in the data directory is a good quick check that can be done manually, although that wouldn’t save me from people that hide their cat photos deep inside the Postgres base directory.