Mammoth
  • |
  • Contact
  • |
CMD | Command Prompt, Inc. - PostgreSQL Solutions, Support & Hosting
  • |
  • |
  • |
  • |
  • |
Updating the .Org docs on backups
Posted Thursday May 21st, 2015 10:48am
by Joshua Drake
| Permalink

Follow cmdpromptinc on Twitter


I spent a great deal of time working through the SQL DUMP portion of the 9.5devel docs this past week. Below is the current text of what I have and it would be great if my readers would take a look and offer some thoughtful feedback. What would you like to see added? What would you like to see changed? Please note that this is reference documentation not tutorial documentation.

This is just the straight HTML dump that is generated from Docbook but since it is inline the links won't work. The current -devel docs are here and the updated version I am working is below:

24.1. SQL Dump

PostgreSQL provides the program pg_dump for generating a backup file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. The basic usage of pg_dump is:

pg_dump -C -F p -f outfile dbname

The use of -C ensures that the dump file will contain the requisite CREATE DATABASE command within the dump file. The use of -Fp ensures that you are using the plain text format and the use of -f allows you to specify the name of the file the dump will be written to. It is also possible for pg_dump to create files in other formats that allow for parallelism and fine-grained control of object backup or restoration. For more details on all options available to pg_dump please refer to the pg_dump reference page.

The pg_dump application requires read access to all objects within the database that it will be operating with. This generally requires database super-user access. It is possible for any database user to use pg_dump to backup the objects that they own regardless of super-user access. This can be achieved using options such as -n schema or -t table.

The primary advantage of using pg_dump over the other backup methods described is that pg_dump output is architecture independent. A backup made with pg_dump can generally be moved between operating systems and different architectures (32bit, 64bit, Sparc, Intel). Whereas file-level backups and continuous archiving are both server-version-specific.

The text files created by pg_dump are internally consistent, meaning, the dump represents a snapshot of the database at the time pg_dump began running. pg_dump does not block other operations on the database while it is working. (Exceptions are those operations that need to operate with an exclusive lock, such as most forms of ALTER TABLE.)

Note: Like any other PostgreSQL client application, pg_dump will by default connect with the database user name that is equal to the current operating system user name. To override this, either specify the -U option or set the environment variable PGUSER.

24.1.1. Advanced pg_dump

The pg_dump application provides other formats. The most notable are the use of -F c or -F d. The use of the custom format (-F c) is an excellent option for smaller databases when you need fine grained control of the objects you chose to restore. The use of the directory format (-F d) allows for parallel connection based backups. If you are performing a backup with many objects and using pg_dump then the directory format will be the most efficient. This option also allows for fine grained control of the objects you chose to restore.

If PostgreSQL was built on a system with the zlib compression library installed, the custom dump format will compress data as it writes it to the output file. This will produce dump file sizes similar to using gzip, but it has the added advantage that tables can be restored selectively.

Example 24-1. Backup a single table

  pg_dump -U user -h host1 -F c -f outfile -t table1 dbname
  

Example 24-2. Using wildcards with table list

  pg_dump -U user -h host1 -F c -f outfile -t table* dbname
  

Example 24-3. Using parallelism and a wildcard table list

  pg_dump -U user -h host1 -F d -f outfile -t table* -j 8 dbname
  

Note: The use of the custom or directory pg_dump formats requires the use of pg_restore and will not work with psql. There is more information on using pg_restore in section Section 24.1.3.

24.1.2. Restoring the Dump

The psql application is the default client that ships with PostgreSQL. It is also the default application used when restoring text based dumps created by the pg_dump application. For details information on psql please see psql. For the purposes of restoring a dump the basic usage is:

psql -f infile -d dbname 

Note: If you omitted -C when executing pg_dump the CREATE DATABASE command will not be in the text file. You will need to create the database yourself from template0 before the executing the restore (e.g., with createdb -T template0 dbname).

Warning

pg_dump does not backup users, roles and other global objects. To properly backup global objects you must use pg_dumpall with the -g parameter. If you do not restore the globals before the text based dump, the database will implicitly restore all objects as the owner passed by -U username. If -U is not passed then the operating system user executing psql will be used.

Important: The dumps produced by pg_dump are relative to template0. This means that any languages, procedures, etc. added via template1 will also be dumped by pg_dump. As a result, when restoring, if you are using a customized template1, you must create the empty database from template0, as in the example above.

After restoring a backup, one should execute ANALYZE on each database so the query optimizer has useful statistics; see Section 23.1.3 and Section 23.1.6 for more information. For more advice on how to load large amounts of data into PostgreSQL efficiently, refer to Section 14.4.

24.1.3. Advanced restore

Example 24-4. Using pipes to restore to new server

   pg_dump -h host1 -d dbname | psql -h host2 -d dbname
  

If one is using the custom, directory or tar formats the restore command is pg_restore. The pg_restore program has many benefits over the use psql including fine grained object restore and parallelism.

Example 24-5. Extracting a text dump from a custom format backup

The following will extract the backup to the standard output. The use of -F is optional as pg_restore should be able to detect the format.

   pg_restore -F c infile 
  

Example 24-6. Restoring a single table

   pg_restore -U username -h host1 -d dbname -t table infile
  

Example 24-7. Using parallelism to restore databases

The use of parallelism will normally allow databases to restore much faster than a single connection based restore. The restore will only execute as quickly as it can restore your largest table but for databases with many objects it is the fastest pg_dump based restore.

   pg_restore -U username -h host1 -d dbname -t table -j 8 infile
  

24.1.4. Using pg_dumpall

pg_dump dumps only a single database at a time, and it does not dump information about roles or tablespaces (because those are cluster-wide rather than per-database). To support convenient dumping of the entire contents of a database cluster, the pg_dumpall program is provided. pg_dumpall backs up each database in a given cluster, and also preserves cluster-wide data such as role and tablespace definitions. The basic usage of this command is:

pg_dumpall > outfile

The resulting dump can be restored with psql:

psql -f infile postgres

It is necessary to have database superuser access when using a pg_dumpall dump. The superuser acess is required to restore the role and tablespace information.

pg_dumpall works by emitting commands to re-create roles, tablespaces, and empty databases, then invoking pg_dump for each database. This means that while each database will be internally consistent, the snapshots of different databases are not sychronized.

Cluster-wide data can be dumped alone using the pg_dumpall --globals-only option. This is necessary to fully backup the cluster if running the pg_dump command on individual databases.

Note: If you use tablespaces, make sure that the tablespace paths in the dump are appropriate for the new installation.

24.1.5. Handling Large Databases

The act of backing up a normal sized database is relatively simple. The act of backing up a large database (>500GB) can be challenging. Fortunately, PostgreSQL is very flexible in its ability to provide a reliable backup. Here is a list of things you might want to consider when backing up a large database.

  1. Use the directory format and the -j NUM option. This will ensure the quickest and most flexible pg_dump style backup.

  2. Use continuous archiving as described in Section 24.2. You can then backup the replica without putting load on the master.

  3. Use pg_basebackupas described in Section 24.2.2.

Warning

The pg_dump methods utilize at least one connection if not many connections (via -j). They also utilize long running transactions. This can cause problems with maintenance. If you find that your database contains a lot of growing bloat consider using a backup method on the master that does not require pg_dump.



Categories: OpenSource, PostgreSQL, Python, SQL

blog comments powered by Disqus

Copyright © 2000-2015 Command Prompt, Inc. All Rights Reserved. All trademarks property of their respective owners.