Chapter 3. Configuration of PostgreSQL

The following chapter discusses the configuration of PostgreSQL. The primary configuration files for PostgreSQL are the postgresql.conf and the pg_hba.conf file. The postgresql.conf file is the global configuration file for a particular PostgreSQL catalog. It sets parameters such as max_connections.

The pg_hba.conf file is the global authentication file. This file determines who and how a user within the PostgreSQL catalog can connect to a database.

3.1. The postgresql.conf

The following section lists each postgresql.conf parameter and also include discussion on what settings are appropriate for the option. The end of this section contains a table listing each parameter and the valid options for the parameter.

3.1.1. File locations

The following parameters offer the ability to change where a particular configuration file resides.

data_directory

Used to specify the working directory for configuration files. Specifically the postgresql.conf, pg_hba.conf and pg_identd.conf . When using this directive you would specify the directory in which the postgresql.conf file resides instead of the main data directory.

Example 3-1. Using the data_directory option

If your configuration files were in /usr/local/pgsql/etc you could start PostgreSQL with the following pg_ctl command:

   /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/etc start
   

Note

The data_directory option only applies to the pg_ctl start command. When using pg_ctl stop you will want to pass the -Doption with the data directory.

Example 3-2. Using pg_ctl with the data_directory directive

-bash-3.00$ bin/pg_ctl -D /usr/local/pgsql/etc start
postmaster starting
-bash-3.00$ LOG:  database system was shut down at 2005-05-06 13:11:18 PDT
LOG:  checkpoint record is at 0/A32B20
LOG:  redo record is at 0/A32B20; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 544; next OID: 17230
LOG:  database system is ready
    
    
-bash-3.00$ bin/pg_ctl -D /usr/local/pgsql/etc stop
pg_ctl: PID file "/usr/local/pgsql/etc/postmaster.pid" does not exist
Is postmaster running?

-bash-3.00$ bin/pg_ctl -D /usr/local/pgsql/data2 stop
LOG:  received smart shutdown request
LOG:  shutting down
waiting for postmaster to shut down....LOG:  database system is shut down
 done
postmaster stopped
-bash-3.00$

   
hba_file

If you do not wish to have the pg_hba.conf file located in the default location you would use this directive. The default location of the pg_hba.conf file is the location you specified with the data_directory directive.

Note

The pg_hba.conf file is required for operation of PostgreSQL.

ident_file

If you do not wish to have the pg_indent.conf file in the default location you would use this directive. The default location of the pg_ident.conf file is the location you specified with the data_directory directive.

Note

The pg_ident.conf file is not required for operation of PostgreSQL.

external_pid_file

You can use this directive to create an additional process-id file. This file is in addition to the file that is checked by the pg_ctl command on startup of the PostgreSQL catalog. The pg_ctl command expects to find a postmaster.pid in $PGDATA/. The advantage to this directive is that the pid file that is written only contains the parent PID. If you need to write a script to store the PID in a universal location (such as /var/run) then use this option.

Warning

Do not arbitrarily remove the postmaster.pid that is found in $PGDATA as it is used to determine if PostgreSQL is actually running. If you accidently start to PostgreSQL isntances on top of the same $PGDATA location you will corrupt your catalog.

3.1.2. Connections and Authentication

The following options are specific to how one would be able to connect to PostgreSQL.

listen_addresses

Use this directive to restrict the ethernet interface a particular PostgreSQL daemon listens on. The default is localhost. This is a good option to use if you run multiple postgresql daemons on a single machine or if you want to limit access to the postgresql daemon to only the localhost.

port

The TCP port that the PostgreSQL daemon will listen on. The default is 5432.

max_connections

The maximum number of concurrent connections to the PostgreSQL daemon. The default is 100. You should increase or decrease this parameter per your individual application needs. However, this option does effect the memory usage of the PostgreSQL process. Be wary of keeping it at a high number without reason.

Note

The max_connections parameter is also effected by the super_user_reserved_conenctions. When in doubt apply the following formula:

(max_connections - superuser_reserved_connections) = user_available_connections

superuser_reserved_connections

A very handy directive, this allows a PostgreSQL DBA to insure that he may access the PostgreSQL daemon regardless of the number of connections currently in use. If your max_connections is set to 100 and your superuser_reserved_connectionsis set to 5, then the max_connections allowed from non-superusers will be 95.

unix_socket_directory

Specifies the directory the PostgreSQL daemon should look for its socket connection. The default is /tmp. If your application or middleware resides on the same machine as the database server it may be a good idea to use unix sockets. They are traditionally faster than TCP/IP.

Note

If you change this parameter you will have to use the -h flag with the psql client.

unix_socket_group

Allows the DBA to set the group owner of the socket. By default it will be the group of the postgresql superuser.

unix_socket_permissions

The file access permissions that are to be placed on the unix socket. The default is 0777, thus anyone can connect. If you only wish the socket owner to connect you could use 0700 or if you only want people who are a member of the unix_socket_group you could use 0770.

Example 3-3. unix_socket_permission permission examples

   0200 - Minimum permissions for the owner to use the socket
   0777 - Anyone can connect, they will still be subject to PostgreSQL authentication
   0700 - Only the owner of the socket can connect
   0770 - The owner and anyone a member of the unix_socket_group can connect
   

bonjour_name

The broadcast name used on a network that supports Rendezvous. Bonjour is also known as Rendezvous and zeroconf.

authentication_timeout

The maximum time allowed for a client to authenticate once connected. The parameter is in seconds and defaults to 600.

ssl

Informs the backend to enable SSL based connections. PostgreSQL must be compiled with the --with-openssl option. You will also need to create a certificate using openssl.

password_encryption

If set to true PostgreSQL will automatically encrypt passwords when using the alter user and create user commands. This should always be true.

db_user_namespace

A odd solution to the problem of having users that are specific to a particular database within the PostgreSQL catalog. By default all users within PostgreSQL are global and thus can connect (unless the pg_hba.conf dictates otherwise) to any database within the catalog. If this options i enabled, you can create users as username@dbname. When the username is passed via connecting client, the @ and the database name is appended to the user name to create the autentication token. You will have to quote the entire username when you create a user using this method.

Example 3-4. Create a user with the db_user_namespace option

   template1=# SELECT * FROM pg_user;
 usename | usesysid | usecreatedb | usesuper | usecatupd |  passwd  | valuntil | useconfig
---------+----------+-------------+----------+-----------+----------+----------+-----------
 jd@bar  |    16388 | t           | t        | t         | ******** |          |
(1 row)

template1=# CREATE USER "jd@foo"
CREATE ROLE
template1=# SELECT * FROM pg_user;
 usename | usesysid | usecreatedb | usesuper | usecatupd |  passwd  | valuntil | useconfig
---------+----------+-------------+----------+-----------+----------+----------+-----------
 jd@bar  |    16388 | t           | t        | t         | ******** |          |
 jd@foo  |    16389 | f           | f        | f         | ******** |          |
 pickes@foo |    16411 | f           | f        | f         | ******** |          |
(3 rows)

template1=# \q
bash-3.00$ bin/psql -h localhost -U pickes foo

Welcome to psql 8.1devel, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)

foo=>
   

Notice than when we connected to the database using the user pickes it automatically appended the database foo per our command line and allowed the user to connect. Using this method alone will not be enough to protect against cross database pollution. You will also have to edit the pg_hba.conf to restrict users to a particular database.

krb_server_keyfile

The location of the kerberos key file. This is only used if you are utilizing Kerberos for your authentication.

krb_caseins_users

If set to on Kerberos user names will be treated as case insensitive.

krb_srvname

Configures the name of the service to be used by Kerberos.

3.1.3. Resource utilization

The following section will discuss options in the postgresql.conf that direct effect resource utilization such as memory and I/O.

shared_buffers

Sets the number of shared memory buffers userd by PostgreSQL. The minimum is 2x the max_connections parameter. The suggested value for this parameter greatly varies depending on your hardware. A good rule of thumb is anywhere from 1000 to 50000 but never more than 33% of your total system ram. Each buffer allocates 8k of shared memory and your operating system will probably have to have its kernel parameters adjusted to address more than the default value of 1000.

In linux the kernel parameter to increase shared memory is /proc/sys/kernel/shmmax and can be adjusted either by increasing the parameter within the file directly or if you distribution supports it by using the sysctl command.

temp_buffers

The number of buffers to be used for temp tables. Similar to shared_buffers but local to the backend (connection) instead of shared across the server. They are also used for new relations, in the transaction in which the relation is being created. The default value is 1000 and each buffer is 8k.

max_prepared_transactions

Sets the number of transactions that may be simultaneously in a prepared state. If set to to zero the option is disabled. This option will cause PostgreSQL to use more shared memory. You will want to adjust the shared memory available to your operating system accordingly. Specifically a prepared transaction is a transaction that has passed the "first phase" of the commit process and is waiting for the second phase.

work_mem

In version of PostgreSQL below 8.x this parameter was known as sort_mem. The name change is reflective of the more general nature of the option. The parameter literally means working memory. If you have a lot of queries that return large data sets you will want to increase this parameter. However, if you have a lot of transactional activity with many concurrent users you may want to adjust it to a lower setting.

Getting the value accurate is difficult as it relies not only on the resources available but also the way the resources are being used. This parameter may be set at runtime, meaning that when a user makes a connection you may set the parameter to a needed value for that connection and the operations that are going to be performed.

If you are unsure if you have the correct setting you can monitory $PGDATA/base/DB_OID/pgsql_tmp . The DB_OID is the oid of the database you are monitoring.

Warning

This setting is per connection. Thus if you set work_mem to equal 64Mb you are giving PostgreSQL the ability to use 64Mb per connection to the database. If you have 100 connections that means that PostgreSQL could use up to 6.4Gb. Be careful with this parameter.

maintenance_work_mem

The amount of memory allocated for maintenance type tasks within the database. For example VACUUM, CREATE INDEX, and ALTER TABLE. Generally the default of 16Mb is too low. If you have the memory available try increasing this setting for better VACUUM and CREATE INDEX performance.

max_stack_depth

The maximum safe depth of the server execution stack. The default is 2Mb. This setting is bound directly to the operating system stack limit. Many operating systems have a limit as low as 8Mb. Unless you specifically run into a problem with this setting it should be left alone.

max_fsm_pages

One of the more important parameters to configure. The max_fsm_pages is directly tied to how often one would have to VACUUM and/or VACUUM FULL. This number should always be higher then the actual pages needed during a VACUUM.

Example 3-5. Knowing the max_fsm_pages needed

   foo=> VACUUM VERBOSE;
INFO:  vacuuming "information_schema.sql_languages"
INFO:  "sql_languages": found 0 removable, 2 nonremovable row versions in 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  vacuuming "pg_toast.pg_toast_10729"
INFO:  index "pg_toast_10729_index" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
[...]
INFO:  free space map contains 17 pages in 12 relations
DETAIL:  A total of 192 page slots are in use (including overhead).
192 page slots are required to track all free space.
Current limits are:  20000 page slots, 1000 relations, using 182 KB.
VACUUM
foo=>
   

If you review the last couple of lines you will see INFO: free space map contains 17 pages in 12 relations, this is stating that the free space map has 17 pages and 12 relations. The pages is the direct correlation to the max_fsm_pages parameter. Your max_fsm_pages must always be higher then the pages value in that line. If it is not, you will have to VACUUM FULL and adjust your pages to an appropriate size. The Current limits are: 20000 page slots, 1000 relations, using 182 KB. lets you know what the current settings the catalog has.

max_fsm_relations

The maximum number of relations (for example tables) that free space will be tracked in the shared free-space map. The typical user does not need to adjust this parameter. Like max_fsm_pages you can review the settings and need of the catalog by running VACUUM VERBOSE.

max_files_per_process

Sets the maximum number of files a backend can open at any given time. This parameter is typically not required for systems such as Linux but can be of use on *BSD. If you ever receive the messages Too many open files try increasing this parameter.

preload_libraries

Used to preload libraries on startup of the server. Originally considered a potential performance improvement it typically will not provide any measurable performance improvement.

vacuum_cost_delay

The length of time in milliseconds that the vacuum process will sleep when the vacuum_cost_limit has been reached. This is a very helpful option for the performance of vacuum on large tables. By default vacuum will use as much resources, including I/O that are made available to it. On smaller tables this isn't typically an issue because vacuum will finish quicker. On larger tables it may be an issue because vacuum will tie up a lot of resource for an extended period of time. Setting this delay will allow vacuum to sleep whenever the vacuum_cost_limit is reached. This will help balance the load generated by the vacuum process. Reasonable delay settings are between 50ms and 200ms.

vacuum_cost_page_hit

Determines the estimated cost to vacuum a buffer in the shared buffer cache. It is typically better to adjust vacuum_cost_limit instead of this parameter.

vacuum_cost_page_miss

Determines the estimated cost to vacuum a buffer that has to be read from disk (versus shared buffer cache). It is typically better to adjust vacuum_cost_limit instead of this parameter.

vacuum_cost_page_dirty

Determines the estimated cost when vacuum modifies a block that was previous clean. Basically, it signifies the extra I/O required to flush a dirty block to the disk. It is suggested that you modify the vacuum_cost_limit instead.

vacuum_cost_limit

The mother of the vacuum cost parameters. Adjust this high or low depending on your performance requirements. The lower the setting the more often vacuum will sleep during the process.

bgwriter_delay

The background writer is a new feature as of 8.x. It specifies the delay in activity for the background writer. It was designed to help reduce checkpoint spikes. There is no official word on the best settings for this option. The setting is in milliseconds and your mileage may vary.

bgwriter_lru_percent

The maximum percentage of dirty buffers that will be written in a given round of activity of the background writer. This parameter is used to allow the background writer to write out buffers that are likely to be recycled in the near future. This is done to reduce the chance that a server process will need to issue their own writes.

bgwriter_lru_maxpages

The maximum pages of dirty buffers that the background writer will write in a given round of activity.

bgwriter_all_percent

The percentage of buffers scanned in each round of activity.

bgwriter_all_maxpages

The maximum number of buffers written in each round of activity.

3.1.4. Write Ahead Log

The Write Ahead Log or WAL is an important feature of PostgreSQL. It insures that in the event of a crash that PostgreSQL can recover all valid transactions and keep a consistent state for your data. It is a standard approach to transaction logging in most enterprise databases. Another name for WAL is roll-forward recovery or REDO. Unfortunately it may require a significant portion of your I/O resources if not configured correctly.

fsync

An on/off setting, if you are running a production database this should be left on. Fsync is the standard method on Unix and Linux to sync all data that is in memory to disk. If you do not leave fsync on there is a potential to loose data in the event of a database or operating system crash. This is especially true when using IDE or SATA drives without a battery backed cache.

wal_sync_method

The type of sync that is going to be called. Common options are fsync, fdatasync, fsync_writethrough, open_sync and open_datasync. The default is fsync because it will work on most platforms. However on Linux this is not a good option due to the overhead it causes. A better option for Linux is open_sync. If you are curious about which fsync methods your OS supports you can use the test_fsync command which is available in the PostgreSQL source under postgresql-8.1dev/src/tools/fsync.

Example 3-6. Compiling and using test_fsync

   bash-3.00$ make
cc -c  -O test_fsync.c
cc -o test_fsync  -O test_fsync.o
bash-3.00$ ./test_fsync
Simple write timing:
        write                    0.016128

Compare fsync times on write() and non-write() descriptor:
(If the times are similar, fsync() can sync data written
 on a different descriptor.)
        write, fsync, close      1.062226
        write, close, fsync      1.661505

Compare one o_sync write to two:
        one 16k o_sync write     0.489219
        two 8k o_sync writes     0.650214

Compare file sync methods with one 8k write:
        (o_dsync unavailable)
        open o_sync, write       0.312525
        write, fdatasync         0.423692
        write, fsync,            0.358816

Compare file sync methods with 2 8k writes:
        (o_dsync unavailable)
        open o_sync, write       0.660938
        write, fdatasync         0.483137
        write, fsync,            0.488169
   

With the above output, the lower the number the better.

full_page_writes

Tells PostgreSQL to whether or not to use full_page_writes. If you value your data leave this option on. This option can be useful when doing bulk loading to a development server where if a system crash insues you can just reload.

commit_delay

A number in microseconds to delay before using fsync to commit transactions to disk. This can be useful on a high transaction system that has many but small transactions. When tuned correctly it allows PostgreSQL to commit multiple transactions to disk at the same time thus minimizing the impact of calling a sync(). Requires the use of commit_siblings as well.

commit_siblings

The minimum number of open transactions that must be present to inact the use of commit_delay.

checkpoint_segments

If your data set receives a lot of write traffic. That is traffic that is primarily comprised of inserts, updates and deletes this setting can be a large performance factor. The default value of 3 is typically only good on very small applications. A better setting will be upwards of 10 or more. There is a downside in that each segment uses 16 megabytes of hard disk space. The checkpoing_segment is kept in $PGDATA/pg_xlog.

For performance reasons it is also a good idea to keep the segments on their own array. You will want to use an array that is efficient for writes but also redundant such as RAID 10.

Example 3-7. Watching for the checkpoint notice.

If you ever see a log message similar to:

   checkpoints are occurring too frequently (10 seconds apart)
   

You need to increase your checkpoint_segments.

checkpoint_timeout

The maximum time between checkpoints in seconds. If you are going to insert a log of data you might consider increasing this to help eliminate checkpoints during the data load. The default seems to work for most people but on heavily loaded systems it can be useful to checkpoint at longer intervals.

checkpoint_warning

A useful setting for determining if you need to increase your checkpoint_segments. The parameter is the number of seconds between a checkpoint that is caused by a too low checkpoint_segments setting. The error would look simiar to the one mentioned in Example 3-7

archive_command

Used by this options allows postgrsql to archive out via copy or other means checkpoint_segments.

Example 3-8. Using cp for to archive segments

   archive_command = 'cp -i %p /mnt/server/archivedir/%f  </dev/null'
   

The above command will copy from the absolute path of the checkpoint_segment which is located in $PGDATA/pg_xlog/%f to /mnt/serverlog/archivedir/%f. The %p is the absolute path and %f is the filename of the segment.

3.1.5. Query tuning

This sections covers options specific to tuning PostgreSQL for the query set you are using.

enable_bitmapscan, enable_hashjoin, enable_indexscan, enable_mergejoin, enable_nestloop, enable_seqscan, enable_sort, enable_tidscan

All of the above the options listed above deal specifically with how the planner is going to manipulate your query to return the data you are looking for. These options should not be modified via the postgresql.conf file. Instead use the set option from an SQL query to tune a particular query.

Example 3-9. Using set to disable/enable query tuning options.

   template1=# set enable_seqscan = off;
   SET
   template1=# set enable_seqscan = on;
   SET
   template1=#
   

effective_cache

Helps the planner assume what may or may not be in disk cache for PostgreSQL. Before turning off seq_scan try increasing this parameter. Generally a good setting is somewhere around 66% of your available system ram. The value is calculated in 8k chunks to be careful not to set it too high.

random_page_cost

Determines the planners estimated cost of a nonsequentailly fetched disk page. The higher the value the more likely the planner will choose to use a seq_scan versus a index scan. Set this lower if you have a decent RAID system with lots of RAM and IO. In general this setting should never be below 1.5.

cpu_tuple_cost, cpu_index_tuple_cost, cpu_operator_cost

Determines the planners estimated cost of processing a tuple, index fetch or where clause (respectively). This is calculated as a fraction of the systems sequential page fetch. These settings have never been proved to help one way or the other. It is best if they are left alone.

3.1.6. Genetic Estimate Query Optimizer (GEQO)

The GEQO was introduced in PostgreSQL 6.5 to help the planner be smarter about queries that would other exhaust available CPU resources to plan. Due to todays very fast CPUs GEQO will generally be slower to use than the standard methods of query optimization. One way to avoid using GEQO is to write your queries with explicit join orders.

geqo

Turns the Genetic Estimate Query Optimizer on or off.

geqo_threshold

This value will determine when the GEQO optimizer will launch. It is set per the number of from or join clauses you have within your query. Each join clause is considered one from clause.

Example 3-10. Counting from clauses

   SELECT * FROM foo
      JOIN baz on (baz.id = foo.id)
      JOIN bing on (bing.id = baz.id);
   

The above would equate to three from clauses. On a machine with very fast CPUs such as AMD Opterons it can be advisable to increase this number from the default of 11 to perhaps 14. However your mileage will vary and you should test extensively.

geqo_selection_bias, geqo_pool_size, geqo_effort, geqo_generations, geqo_random_seed

These are the various tuning parameters for the GEQO. However there isn't any evidence that changing these will help or hinder your query performance. For more information on these parameters please see the PostgreSQL documentation.

3.1.7. Other Query Modifiers

The following three options are typically ignored by the unknowning. These three options can greatly increase performance of PostgreSQL which used correctly.

default_statistics_target

The default_statistics_target is the number used to control the level of detail of table statistics collected. One way to know if this value needs to be increased is to run a EXPLAIN ANALYZE on a query. If the number of rows that the planner thinks are going to be returned versus the actual number of rows that are returned greatly differ, you should try increasing this parameter.

This parameter is global within the postgresql.conf and therefore should be adjusted cautiously. It is better to utilize the command to set the value on a per column basis. There will be more discussion on the use of this parameter in the chapter.

Example 3-11. Showing the row count mismatch

   foo=# explain analyze select count(id) from bar;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=155181.07..155181.07 rows=1 width=4) (actual time=7512.420..7512.421 rows=1 loops=1)
   ->  Seq Scan on bar  (cost=0.00..147665.05 rows=3006405 width=4) (actual time=0.044..5177.690 rows=2070775 loops=1)
 Total runtime: 7512.546 ms
(3 rows)
   
   

The first rows= is the number of rows the planner thinks are going to be returned by the query. The second rows= value indicates the actual number of rows returned by the query. As you can see there is a large disparity between the two. This can be adjusted by modifying the default_statistics_target or by using .

from_collapse_limit

You should only change this at query run-time if your query requires it. The planner will merge sub-queries into the upper query FROM list if the resulting FROM list will not have greater than this number of from clauses. You should keep this number below the GEQO_THRESHOLD.

join_collapse_limit

You should only change this at query run-time if your query requires it. The plammer will flatten explicit INNER JOINS into the upper FROM clauses if the resulting FROM list will be smaller than this number.

3.1.8. Logging and Debugging Options

For debugging problems with PostgreSQL there are few things more useful than good logging. The following section discusses all of the various logging routines that PostgreSQL offers.

log_destination

Where you would like the logging information to be sent. Depending on the platform you have STDERR (Standard Error), syslog (System Logger), or eventlog (Win32 only evenlog).

redirected_stderr

Use this option to send information to STDERR and a logfile. It is a good idea to use this as not all error messages passed from PostgreSQL will use STDERR.

log_directory

The directory used to store log files with redirected_stderr is used.

log_filename

The name of the file that will be used with redirected_stderr and log_directory. This value is treated as a strftime() pattern which means it takes the same arguments as the Linux date command. If no arguments are passed the suffix to the filename will have an appended EPOCH stamp.

Example 3-12. Different logfile data formats.

   postgresql-%a
   
The above would yield a log file name of postgresql-Mon if the day was Monday.

log_rotation_age

Used in conjunction with redirected_stderr. This number specifies the maximum number of minutes before a log rotation takes place. Set to zero to disable. You can also use system wide log rotation mechanisms such as logrotate to enable this facility.

log_rotation_size

Used in conjunction with redirected_stderr. This number specifies the maximum size in kilobytes that a logfile may be. The value of zero disables this feature.

log_truncate_on_rotation

Used in conjunction with redirected_stderr. When set to true, PostgreSQL will truncate the logfile on file rotation.

syslog_facility

The syslog facility to use. Only works if logging to syslog is enabled. See your operating system documentation for syslog facilities available to you.

syslog_ident

When used with syslog this value will be prepended to messages sent to the log. The default is postgres

client_min_messages, log_min_messages, log_min_error_statement

Controls the level of detail that is logged. The highest being debug5 and the lowest being panic. During normal operation leaving this at error is typically enough. The levels available are: debug5, debug4, debug3, debug2, debug1, info, notice, warning, error, log, fatal and panic.

Example 3-13. Debug2 output

LOG:  database system was shut down at 2005-07-19 13:48:57 PDT
LOG:  checkpoint record is at 0/337D20
LOG:  redo record is at 0/337D20; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 578; next OID: 10794
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system is ready
LOG:  transaction ID wrap limit is 2147484144, limited by database "postgres"
DEBUG:  forked new backend, pid=31433 socket=7
DEBUG:  checkpoint starting
DEBUG:  checkpoint complete; 0 transaction log file(s) added, 0 removed, 0 recycled
LOG:  transaction ID wrap limit is 2147484144, limited by database "postgres"
   
log_error_verbosity

Controls the verbosity of logging that is performed, TERSE, DEFAULT, or VERBOSE. It is safe to leave this as the default.

log_min_duration_statement

Helpful for diagnosing slow queries. If set, the query statement will only be logged if the duration of execution is higher than this number. The value is in milliseconds.

silent_mode

Set this to true to force the server into the background. Do not use this option unless syslog or redirect_stderr is enabled.

debug_print_parse, debug_print_rewritten, debug_print_plan, debug_pretty_print

Sends voluminous amounts of information to the logs. Useful but usually only to the very scientific minded. In general use EXPLAIN VERBOSE or EXPLAIN VERBOSEinstead.

log_connections, log_disconnections

Used to log connections or disconnections.

log_hostname

If you want to log the hostname of the client connecting to the database set this to true.

log_statement

A very useful option. When set to true each query that is sent to the backend will be logged. When used in conjunction with log_pid and log_duration it can be an invaluable tool to tracking down slow queries.

Example 3-14. log_statement

LOG:  database system was shut down at 2005-07-21 15:35:13 PDT
LOG:  checkpoint record is at 0/3383F8
LOG:  redo record is at 0/3383F8; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 585; next OID: 16385
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system is ready
LOG:  transaction ID wrap limit is 2147484144, limited by database "postgres"
DEBUG:  forked new backend, pid=1413 socket=7
LOG:  statement: select * from pg_database;
   
log_duration

When set to true this option will log the duration in milliseconds each operation takes.

log_line_prefix

A printf style prefix that will be prepended to each line in the postgresql log.

Example 3-15. log_line_prefix

From the postgresql.conf

   
    %u = user name
    %d = database name
    %r = remote host and port
    %h = remote host
    %p = PID
    %t = timestamp (no milliseconds)
    %m = timestamp with milliseconds
    %i = command tag
    %c = session id
    %l = session line number
    %s = session start timestamp
    %x = transaction id
    %q = stop here in non-session processes
    %% = '%'
      
   

3.1.9. Statistics Logging

The following section discusses options for statistics logging. An example of statistics logging is how many index scans have there been on a particular table. The statistics collector is a good way to find out the types of resources you are utlizing with PostgreSQL. It includes the ability to track currently active queries, how many queries are using index scans and other such information. It is also required for the use of the autovacuum feature.

log_parser_stats, log_planner_stats, log_executor_stats, log_statement_stats

Enables various statistics for write performance. There currently isn't any simple tool to analyze the information that is provided.

stats_start_collector

This paramter specifies wether or not to start the statistics collector. This is required if you are going to use the autovacuum features.

stats_reset_on_server_start

If set to True then all stats collected will be reset on each server restart.

stats_command_string

It set to True this option allows you to track current running queries via the pg_stat_activity table.

stats_row_level

Provides useful information about table and index use. It is also required if you are going to use the autovacuum feature.

stats_block_level

Useful for monitoring I/O and cache hits for tuning of variables and hardware. Should only be used when the initial load testing is being performed or if you start to see unexpected load on your server.

3.1.10. Autovacuum parameters

In versions previous to 8.1, PostgreSQL required that VACUUM be run either manually or via the pg_autovacuum userspace daemon. In 8.1 VACUUM was integrated into the backend. The following parameters are specific to only the integrated version of VACUUM.

autovacuum

Set to true if you wish to use this feature.

autovacuum_naptime

The time in seconds between autovacuum runs.

autovacuum_vacuum_threshold

The minimum number of tuples that are updated before autovacuum will execute a VACUUM.

autovacuum_analyze_threshold

The minimum number of tuples that are updated before autovacuum will execute an ANALYZE.

autovacuum_vacuum_scale_factor

This number is used in conjunction with autovacuum_vacuum_threshold to help determine when a autovacuum will launch. If score = X_base_threshold + X_scale_factor * reltuples is greater than the number of dead tuples a vacuum will launch.

autovacuum_analyze_scale_factor

This number is used in conjunction with autovacuum_analyze_threshold to help determine when an autovacuum will launch an ANALYZE. If score = X_base_threshold + X_scale_factor * reltuples is greater than the number of dead tuples an ANALYZE will launch.

3.1.11. Client Connection Defaults

This section includes parameters for items specific to a connecting client.

search_path

Sets the default search_path for a connecting user. The default is $user,public. The search path is used in conjunction with schemas. For example if this value was set to foo, then when a user logged in the would automatically be placed in the schema foo for the databaes they were connecting to.

default_tablespace

Used to set the default table space. Use ALTER DATABASE instead.

check_function_bodies

When set to false, functions will not be checked to see if they are valid before creation. It is generally not a good idea to use this option.

default_transaction_isolation

Sets the default isolation level for PostgreSQL. This is generally kept at read committed. However during certain phases of replication with Slony-I and Mammoth Replicator the isolation level will be modified to be serializable.

default_transaction_read_only

Makes all transactions read only for non-temporary tables. Fairly useless as a global parameter but useful for certain specific connection types. Use the SET command instead to modify this on a as needed basis.

statement_timeout

The value in milliseconds to wait for a statement to return before it is terminated.

3.1.12. Locale and Formatting

This section is used to help configure various formatting options in regards to timezones and locales.

datestyle, timezone, australian_timezones

Configures the display format for dates. If australian_timezones is set to true then CST, EST, and SAT are interpreted as Australian instead of US timezones.

extra_float_digits

You can use this option to help determine the amount of specifity you wish to have with floating point data types.

client_encoding, lc_messages, lc_monetary, lc_numeric, lc_time

Sets the locale for their respective options such as numeric and error messages.

client_encoding

Configures the client side encoding for multibyte character sets.

explain_pretty_print

Makes the EXPLAIN output a little nicer.

dynamic_library_path

If a fully qualified directory path is not provided when using the LOAD or CREATE FUNCTION commands, this path is prepended to the location of the library.

3.1.13. Lock Management

deadlock_timeout

Used to set the amount of time in millisecnds to wait on a lock before checking if a deadlock as occured.

max_locks_per_transaction

Used to determine how many tables may require a lock at any given time per transaction. If you have a large database with many lookup tables, you may want to increase this. Otherwise it is probably best to leave it as the default.

3.1.14. Version Platform Compatibility

This section offers parameters to make your PostgreSQL act like older version of PostgreSQL. It is usually a better idea to just fix your code but it can be useful when there are time and budget constraints.

add_missing_from

A non-SQL standard option that allows the planner to add tables to the plan that the query writer forgot. This can cause problems and should be left false.

regex_flavor

Determines the level of regex you wish to support. This should be left at advanced unless you need older PostgreSQL version compatibility.

sql_inheritance

Only needed if you use inheritance and you are upgrading from PostgreSQL 7.0.

default_with_oids

Controls whether or not the CREATE TABLE statement will automatically add the system OID column. Leave this as false.

Warning

Using an OID column can increase your chances of a OID wrap around failure. Don't use them.

escape_string_warning

transform_null_equals

Used to help in query writing with a NULL.

Example 3-16. NULLs when tranform is on

		SELECT * FROM foo WHERE bar = NULL;
		

Will silently transform to:

		SELET * FROM foo WHERE bar IS NULL;
		

3.1.15. PostgreSQL.Conf quick reference

The following is a quick reference table of all of the parameters, valid options and defaults with the postgresql.conf. The Configured At column will be one of Runtime, Startup, Superuser, Reload. If Runtime you can modify the command with the SET command. If set to Startup the option can only be changed at the startup of the database. If set to Superuser the command can be changed at Runtime but only by a PostgreSQL Superuser. if set to Reload then the setting can be changed in the postgresql.conf and then made active by sending a HUP signal to the postgresql backend.

Table 3-1.

Conf SettingRangeDefaultConfigured At
data_directorydirectoryConfigDirStartup
hba_filefilenameConfigDir/pg_hba.confStartup
ident_filefilenameConfigDir/pg_ident.confStartup
external_pid_filefilenamenoneStartup
listen_addresseslocalhostStartup 
port129 to 327685432Startup
max_connections 2 to 262143100Startup
superuser_reserved_connections0 To max_connections - 12Startup
unix_socket_directory ''Startup 
unix_socket_group''Startup 
unix_socket_permissions 0777Startup 
rendezvous_name''Startup 
authentication_timeout 1-600 sec600Reload
sslTrue, FalseFalseStartup
password_encryptionTrue, FalseTrueRuntime
krb_server_keyfile''Startup 
db_user_namespace True, FalseFalseReload
shared_buffers16 to 2621431000Startup
work_mem64 to Int Max1024Runtime
maintenance_work_mem1024 to Int Max8192Runtime
Free Space Map   
max_fsm_pages 1000 to Int Max20000Startup
max_fsm_relations10 to Int Max1000Startup
max_stack_depth   
max_files_per_process 25 to Int Max1000 Startup
preload_librariesFile pathEmptyStartup
vacuum_cost_delay0Runtime 
vacuum_cost_page_hit1Runtime 
vacuum_cost_page_miss10Runtime 
vacuum_cost_page_dirty20Runtime 
vacuum_cost_limit200Runtime 
bgwriter_delay200Startup 
bgwriter_percent1Startup 
bgwriter_maxpages100Startup 
fsync True, FalseTrueStartup
wal_sync_methodfsync, fdatasync, open_sync, open_datasyncVaries by platformStartup
wal_buffers4 to Int Max8Startup
commit_delay0-1000000Runtime
commit_siblings1 -10005Runtime
checkpoint_segments 1 to Int Max3Startup
checkpoint_timeout 30 to 3600300Startup
checkpoint_warning0 to Int Max0Startup
Archiving   
archive_commandshell command''Startup
enable_bitmapscan, enable_hashagg, enable_hashjoin, enable_indexscan, enable_mergejoin, enable_nestloop, enable_seqscan, enable_sort,enable_tidscanTrue,FalseTrueRuntime
effective_cache_size 0 to Double1000Runtime
random_page_cost0 to Double4Runtime
cpu_tuple_cost0 to Double0.01Runtime
cpu_index_tuple_cost0 to Double0.001Runtime
cpu_operator_cost0 to Double0.0025Runtime
geqoTrue, falseTrueRuntime
geqo_threshold2 to Int Max11Runtime
geqo_selection_bias 1.5-2.02.0Runtime
geqo_pool_size1.5-2.00Runtime
geqo_effort1.5-2.01Runtime
geqo_generations1.5-2.00Runtime
geqo_random_seed1.5-2.0-1Runtime
default_statistics_target 1 -100010Runtime
from_collapse_limit0 to Int Max8Runtime
join_collapse_limit1 to Int Max8Runtime
log_destinationstderr, syslog, eventlogstderrStartup
redirect_stderr   
log_directorydirectorypg_logStartup
log_filenamespecialpostgresql-%Y-%m-%d_%H%M%S.logStartup
log_rotation_age0 to Int Max1440Startup
log_rotation_size0 to Int Max10240Startup
log_truncate_on_rotationTrue, FalseFalseStartup
syslog_facilityLOCAL#LOCAL0Startup
syslog_identpostgresStartup 
client_min_messagesdebug5, debug4, debug3, debug2, debug1, info, notice, warning, error, log, fatal, panicnoticeRuntime, Superuser, Superuser
log_min_messagesdebug5, debug4, debug3, debug2, debug1, info, notice, warning, error, log, fatal, panicnoticeRuntime, Superuser, Superuser
log_min_error_statementdebug5, debug4, debug3, debug2, debug1, info, notice, warning, error, log, fatal, panicpanicRuntime, Superuser, Superuser
log_error_verbosityterse, default, verbosedefaultSuperuser
log_min_duration_statement-1 to Int Max-1Superuser
silent_modeTrue, FalseFalseStartup
debug_print_parse, debug_print_rewritten, debug_print_plan, debug_pretty_print True, falsefalse 
log_connections log_disconnectionsTrue, falseFalseStartup
log_disconnectionsTrue, falseFalseStartup
log_hostnameTrue, FalseFalseStartup
log_statementNone, DDL, Mod, AllFalseSuperuser
log_durationTrue, FalseFalseSuperuser
log_line_prefixSpecial''Superuser
log_parser_stats, log_planner_stats, log_executor_stats, log_statement_stats True,FalseFalseSuperuser
stats_start_collectorTrue, FalseTrueStartup
stats_reset_on_server_startTrueStartup 
stats_command_stringFalseStartup 
stats_row_levelFalseStartup 
stats_block_levelFalseStartup 
search_pathpath'$user,public'Runtime
default_tablespace''Runtime 
check_function_bodiesTrue, FalseTrueRuntime
default_transaction_isolation read committed, serializable'read committed'Runtime
default_transaction_read_onlyTrue, FalseFalseRuntime
statement_timeout0 to Int Max0Runtime
datestyle timezone australian_timezones'iso, us'unknownfalseRuntime 
extra_float_digits-14 to 20Runtime
lc_messages, lc_monetary, lc_time, lc_numericSystem-dependantSpecialRuntime
client_encodingOS-dependantsql_ascii Startup
explain_pretty_print True,FalseFalseRuntime
dynamic_library_path path'$libdir'Superuser
deadlock_timeout 1 to Int Max1000Startup
max_locks_per_transaction10 to Int Max64Startup
add_missing_fromTrue, FalseTrueRuntime
regex_flavoradvanced, extended, basicadvancedRuntime
sql_inheritance True, FalseTrueRuntime
default_with_oidsTrue, FalseTrueRuntime
transform_null_equals True, falsefalseRuntime
Customized Options