Chapter 4. Database Management

This chapter covers several topics associated with managing a PostgreSQL database system. There is also a section devoted to the topic of backing up and restoring data from a database.

4.1. Maintaining a Database

Database maintenance is a broad subject. This section covers the physical maintenance of the system (pertaining to its disk usage), analytical maintenance (to increase performance), and database object documentation (to add to the maintainability and clarity of the schema).

Warning

On any database that modifies data, the VACUUM process is not optional. It must be performed, either in a scheduled or automated way. If you do not use VACUUM you will eventualy loose data.

The three actively used commands to maintain a PostgreSQL database are:

VACUUM primary purpose is to protect from XID wrapaound failures by removing or marking resuable deleted tuples caused by rollbacks or other statements such as DELETE or UPDATE. It also marks tuples as resuable to help keep the relations from growing out of proportion.

Note

The UPDATE statement does not actually update a tuple in place. It instead marks the exsting tuple as deleted and the appends the updated tuple to the end of the page.

ANALYZE is used to update statistics for query planner use. Up to date statistics allow the planner to choose the best plan possible for a particular query. Specifically decisions such as whether or not to use an index for a particular query are directly related to the use of ANALYZE and statistics gathered by the process.

REINDEX is used to rebuild a particular index or series of indexes within the database. In version of PostgreSQL pre 7.4 this command was the only way rebalance indexes. In versions 7.4 and above the VACUUM command handles the majority of this work. Still, REINDEX is a neccessary command in case your index becomes unbalanced or corrupted.

Note

In general your indexes should not get corrupted unless you are using a index that is not logged by WAL or you have faulty hardware. An example index type that is not logged by WAL is hash.

In PostgreSQL versions prior to 8.1.x it was good practice to perform a VACUUM at least nightly on a production database. While it can be run at the same time data is accessed, doing so will typically decrease the performance of the server during the time that the VACUUM is running. As such, if you are going to use scheduled vacuums it was generally preferable to schedule it at a time when you do not expect a great deal of database activity. As of 8.1 VACUUM has been integrated into the backend enabling users a relatively maintenance free scenario when using PostgreSQL.

Note

Any time an exceptionally large number of records are updated or deleted, it is prudent to perform a VACUUM to mark the deleted tuples reusable by PostgreSQL.

4.1.1. Using VACUUM

The syntax for the VACUUM SQL command is as follows:

  VACUUM [ FULL | FREEZE ] [ VERBOSE ] [ table ]
  VACUUM [ FULL | FREEZE ] [ VERBOSE ] ANALYZE [ table [ (column [, ...] ) ] ]

Used without any of the optional keywords or identifiers, a VACUUM statement will clean up each table in the presently connected database, one at a time, deleting temporary data and recovering disk space. This use of VACUUM is primarily to mark deleted tuples as resuable.

FULL

The use of the FULL keyword is used to remove deleted tuples and reclaim the space used by those tuples. The use of the FULL keyword will cause an ACCESS EXCLUSIVE lock on the relation being modified. During normal operation a VACUUM FULL should not be required.

A VACUUM FULL will be required if you overrun your max_fsm_pages parameter.

FREEZE

A FREEZE is similar to FULL. However when using FREEZE if there are no other transactions happening within the database, all tuples within the database will be protected from a XID wraparound failure. This option is generally used in conjunction with the creation of template databases.

VERBOSE

VERBOSE, as the keyword suggests executes a VACUUM in a VERBOSE manner. A large amount of output will be displayed during the operation. This command should be run at least once a week on active databases to allow administrators to determine if their max_fsm_pages and max_fsm_relations are set to an appropriate level.

ANALYZE

The ANALYZE keyword is used to integrate the use of ANALYZE into a single process with VACUUM. Running ANALYZE with VACUUM is no different than running ANAYLZE independantly except that the ANALYZE will happen in conjunction with VACUUM.

table

The optional table parameter is used to VACUUM a specific table within the database. If you specify a column you will need to specify the ANALYZE keyword as well. When this is done the column's statistics will be updated as well as the table being vacuumed.

Example 4-1. Executing a VACUUM with the option table parameter.

      booktown=# VACUUM ANALYZE authors(id);
      VACUUM
      

Example 4-2. Executing a VACUUM VERBOSE

booktown=# VACUUM VERBOSE;
[...]
INFO:  free space map contains 84 pages in 104 relations
DETAIL:  A total of 1664 page slots are in use (including overhead).
1664 page slots are required to track all free space.
Current limits are:  20000 page slots, 1000 relations, using 182 KB.
LOG:  transaction ID wrap limit is 1073887371, limited by database "booktown"
LOG:  transaction ID wrap limit is 1073887371, limited by database "booktown"
VACUUM   
      

The important lines within this example are the INFO and DETAIL lines. The INFO line is telling you that your free space map contains 84 pages and 104 relations. The DETAIL line is stating that your database has 1664 page slots in use, which includes all associated overhead. The number of pages and relations in use directly correlates to the max_fsm_pages and max_fsm_relations parameters as mentioned in Section 3.1. The number of pages and relations that are in use must always be less than the current limits as set in the postgresql.conf file. If at any point you overrun the free space map you will have to execute a VACUUM FULL to reclaim the space.

4.1.2. Using vacuumdb

The VACUUM command has a command-line executable equivalent called vacuumdb. The vacuumdb script provides two significant added functions to the normal use of the VACUUM SQL statement.

  • The vacuumdb takes connection argument and thus can easily be executed on machines that are not the database server.

  • Secondly, the vacuumdb command can easily be run as part of a scheduler such as cron.

Here is the syntax for vacuumdb:

  vacuumdb [ OPTION ] [ DBNAME ]

Like the createdb and dropdb scripts, vacuumdb accepts both single-dashed and GNU-style double-dashed arguments from the command line. The only required option is the dbname (unless you specify - -all), which describes that all databases are to be vacuumed. The OPTION parameters describe which mode the VACUUM command should be invoked in. The following are the available options for the vacuumdb script:

-a, - -all

Applies the VACUUM command, with specified options, to all databases in the system catalog.

-d DBNAME , - -dbname=DBNAME

Explicitly specifies the name of the database to perform the VACUUM statement on. This option is mutually exclusive to the - -all option.

-t ' TABLE [ ( column [, ...] ) ]', - -table=' TABLE [ ( column [, ...] ) ]'

Targets a specific table TABLE (or specific columns within that table) to be affected. The - -analyze option is required to describe specific columns.

-f, - -full

Equivalent to the FULL keyword for the VACUUM SQL command.

-z, - -analyze

Equivalent to the ANALYZE keyword for the VACUUM SQL command. Updates stored statistics about the data allocation between columns, which are used by the query optimizer to help guide internal query planning.

-e, - -echo

Accepts no parameters, and causes the query sent to PostgreSQL to be displayed to the screen as it is executed by vacuumdb.

-q, - -quiet

Accepts no parameters, and causes no output to be sent to stdout (though any errors will still be sent to stderr).

-v, - -verbose

Equivalent to the VERBOSE keyword for the VACUUM SQL command. Causes a detailed internal report of the processing performed to be displayed.

-h HOSTNAME , - -host=HOSTNAME

Specifies that you are connected to HOSTNAME, rather than the localhost. Use this option when vacuuming a remote database.

-p PORT , - -port=PORT

Specifies that the database connection is made on port PORT, rather than the default port (usually 5432, though it may have been configured differently when PostgreSQL was compiled, by the - -with-pgport flag).

-U USERNAME , - -username=USERNAME

Specifies that the username USERNAME is the user who connects to PostgreSQL (rather than the name of the system user executing vacuumdb).

- -help

Displays the help output for the vacuumdb command.

- -version

Displays the version of the vacuumdb command being executed.

Example 4-3 demonstrates the use of the vacuumdb script from the Book Town server. The -U flag specifies that the connection should use the manager user to connect, while the - -all flag causes all databases in the system catalog to be affected in sequence.

Example 4-3. Using vacuumdb on all databases

[jd@booktown ~]$ vacuumdb -U manager - -all
vacuumdb: vacuuming database "postgres"
VACUUM
vacuumdb: vacuuming database "template1"
VACUUM
vacuumdb: vacuuming database "booktown"
VACUUM

As mentioned, because of the connectivity options available to the vacuumdb script, it can be easily executed from a remote server. Example 4-4 shows a similar process to the command used in Example 4-3, but with the addition of the -h flag to specify a remote server named booktown.commandprompt.com. Example 4-4 also targets the booktown database specifically, rather than all databases.

Example 4-4. Using vacuumdb on a remote database

[jworsley@cmd ~]$ vacuumdb -h booktown.commandprompt.com -U manager booktown
VACUUM

4.1.3. Using the new integrated autovacuum

In all version of PostgreSQL prior to 8.1 when vacuuming you had to configure a manual vacuum process via a scheduler or run the commands by hand. There was a contributed module called pg_autovacuum that gained popularity as of version 7.3 and gained significant footing in 7.4 and 8. With the release of 8.1 the pg_autovacuum daemon is now deprecated in favor of the new integrated autovacuum.

The use of the integrated autovacuum requires that the postgresql.conf parameters, stats_start_collector and stats_row_level are enabled. You will also want to ensure that you have an extra connection available in super_user_reserved_connections.

Note

The enabling of stats can cause a performance penalty. Typically this is in the range of 5%, however it will vary on the performance of your system. For most users this is acceptable and stats should be enabled.

The specific configuration parameters for autovacuum are, vacuum_cost_delay, vacuum_cost_page_hit vacuum_cost_page_miss, vacuum_cost_page_dirty, vacuum_cost_limit and they are discussed in .

4.1.4. Documenting a Database

4.1.4.1. Using COMMENT

PostgreSQL offers a non-standard SQL command called COMMENT, which allows for documentation of any database object. By using COMMENT on a table, function, operator, or other database object, you can provide description that is stored in the pg_description system table. Descriptions can be easily retrieved through a set of extended psql slash commands.

Most standard objects in the database have a default description, which can be perused (along with any user-added descriptions) with the \dd slash command within psql.

Here is the syntax for COMMENT:

 COMMENT ON
{
  TABLE object_name |
  COLUMN table_name.column_name> |
  AGGREGATE agg_name (agg_type) |
  CAST (sourcetype AS targettype) |
  CONSTRAINT constraint_name ON table_name |
  CONVERSION object_name |
  DATABASE object_name |
  DOMAIN object_name |
  FUNCTION func_name ( [ [ argmode ] [ argname ] argtype [, ...] ] ) |
  INDEX object_name |
  LARGE OBJECT large_object_oid |
  OPERATOR op (leftoperand_type, rightoperand_type) |
  OPERATOR CLASS object_name USING index_method |
  [ PROCEDURAL ] LANGUAGE object_name |
  RULE rule_name ON table_name |
  SCHEMA object_name |
  SEQUENCE object_name |
  TRIGGER trigger_name ON table_name |
  TYPE object_name |
  VIEW object_name
} IS 'text'

In this syntax, object_name is the name of the database object that you wish to add a comment of description to. The keywords for the major database objects are optional, but if you intend to place a comment on a column, function, aggregate function, operator, or trigger etc..., you must specify the preceding keyword so that PostgreSQL knows what kind of syntax to expect, and where to look for the object name specified. You should always comment any custom objects that you create.

Note<!- - title>Scope of Comments ->

Any user can see an object comment. Only a super user can change comments on an object they don't own. Because any user can see the comments for an object do not place sensistive information in a comment.

The description string constant following the IS keyword is the literal comment to be placed in the database. Example 4-5 demonstrates placing a simple description on the id column of the booktown database. To delete a comment use a description string of NULL.

Example 4-5. Commenting the books table

booktown=# COMMENT ON COLUMN books.id
booktown-#         IS 'An Internal Book Town Identifier';
COMMENT

The COMMENT server message returned in Example 4-5 indicates that the comment was successfully placed on the column.

Example 4-6. Removing the books table comment

booktown=# COMMENT ON COLUMN books.id
booktown-#         IS NULL;
COMMENT

4.1.4.2. Retrieving comments

You may retrieve comments from the database easily by using the psql slash-plus commands. For a complete list of slash-plus commands please see Chapter 8. Some of the more common slash-plus commands follow:

\d+

Displays the same information as the standard \d command (displaying all tables, views, sequences, and indices in the currently connected database), but adds a column for the comments as well.

\l+

Displays comments on all databases.

\df+ [ pattern ]

Displays descriptions for each function in the currently connected database (as well as the language and source of the function). You may wish to view this slash command in expanded mode for readability by first initiating the \x slash command (see Chapter 8 for more about this). You can optionally supply a regular expression pattern to compare against existing function names, allowing you to limit the number of functions displayed.

\dt+

Displays comments on all tables in the currently connected database.

\di+

Displays comments on all indices in the currently connected database.

\ds+

Displays comments on all sequences in the currently connected database.

\dv+

Displays comments on all views in the currently connected database.

\dS+

Displays comments on system tables. Note that comments placed on system tables are still tied to a particular database, even though they are accessible from all databases, and will therefore not be visible when performing a \dS+ slash command from another database connection.

\dd

Displays all descriptions for all database objects.

Example 4-7 demonstrates the retrieval of the comment that was placed on the books table in Example 4-5 with the use of the psql \d+ command.

Example 4-7. Retrieving a comment

booktown=# \d+ books
                           Table "books"
 Attribute  |  Type   | Modifier |           Description
- -- -- -- -- -- -+- -- -- -- --+- -- -- -- -- -+- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
 id         | integer | not null | An Internal Book Town Identifier
 title      | text    | not null |
 author_id  | integer |          |
 subject_id | integer |          |
Index: books_id_pkey

The COMMENT SQL command provides a very simple way of internally documenting your objects, from tables to functions. This can be of great help when working with large or complicated database schema. Even the best of naming conventions do not always result in database objects whose applications are self-evident; this is especially the case when working with multiple developers.