The pg_basebackup approach differs from the standard pg_start_backup(), rsync (or other file system copy), pg_stop_backup() approach in that it uses the replication protocol over a standard Postgres connection to make the base backup. A few highlights:
There’s a lot to like, but I’m greedy....
If “idle hands are the devil’s playthings” I can only imagine what the devil would do with idle CPUs and network bandwidth. It would be great if pg_basebackup were able to parallelize creation of the base backup and make use of more available resources. In the past I’ve written a script to parallelize rsync by first processing its --itemize-changes output and then spawning multiple rsync processes in parallel to perform the copies; this parallelization can really cut down on the time to get a base backup copied. There are times when getting something done as fast as possible trumps other concerns.
nierman@source:~$ pg_basebackup -D - -Ft -x | pbzip2 -p8 | nc -l 1201(in an actual use case I'd also insert pv in the pipeline to get nice status/progress updates at various stages), and on the destination server:
nierman@destination:~$ nc source 1201 > newbasedir/base.tar.bz2This process is a bit more complicated than a single command from the client/destination server, but more importantly we lose out on things like streaming WAL files when using the tar format with pg_basebackup. Note that I’ve also used pbzip2 above (a parallelized version of bzip2). It would be great to have the option to use external programs like pbzip2, pigz, etc. with pg_basebackup compression. Let’s call that Wish #2b.
pg_basebackup: starting background WAL receiver 2052690927/2052718105 kB (100%), 1/1 tablespace pg_basebackup: could not get transaction log end position from server: FATAL: could not open file "./#postgresql.conf#": Permission deniedpb_basebackup chunked through almost 2TB of data and then it died just before finishing. Ack. Why? At some point someone had edited the postgresql.conf file as root and their editor saved the backup copy seen above (in the postgres data directory of course!); the Postgres backend couldn’t read that file (owned by root) when creating the base backup. I would have loved to see a simple error message and have pg_basebackup continue on its merry way. Certainly an initial ls -l in the data directory is a good quick check that can be done manually, although that wouldn’t save me from people that hide their cat photos deep inside the Postgres base directory.