2.5. Initializing the Filesystem

Before you can create a database in a database cluster, the filesystem must first be initialized. Before version 8 of PostgreSQL there were two ways to initialize the filesystem for use with PostgreSQL; you could either use the initdb application to create an entirely new database cluster (as was used to prepare your initial database system, in Chapter 2), or you could use the initlocation application to prepare a secondary data directory. With release 8 and above the initlocationoption has been deprecated in favor of tablespaces.

A database cluster (also termed catalog) represents several associated databases. A single instance of the postmaster process can only access one database cluster at a time. Alternatively, through tablespaces, you can create other locations to place specific databases or tables that are part of an existing cluster.

The following sections cover these two approaches.

2.5.1. Initializing a Database Cluster

Use the initdb program to create and initialize a new database cluster within your filesystem. Again, a database cluster is the framework upon which PostgreSQL databases are created. You should already have one cluster in the data directory which was initialized in Chapter 2.

You may use initdb to initialize a new data directory for a database cluster, and instruct postmaster to start up using that data cluster instead of the default. Alternatively, you may have two postmaster processes running at the same time with different database clusters, provided that they are configured to listen on different TCP/IP ports.

After you use initdb to create a new database cluster, that new cluster's filesystem will be owned by whichever operating system user you were logged in as when issuing the command.

Note

You can not run the initdb command as the root user. If you are using the Win32 version of PostgreSQL an Administrator may not initialize the database with initdb. This is for security purposes.

Here is the syntax for initdb:

Usage:
  initdb [OPTION]... [DATADIR]

Options:
 [-D, --pgdata=]DATADIR     location for this database cluster
  -E, --encoding=ENCODING   set default encoding for new databases
  --locale=LOCALE           initialize database cluster with given locale
  --lc-collate, --lc-ctype, --lc-messages=LOCALE
  --lc-monetary, --lc-numeric, --lc-time=LOCALE
                            initialize database cluster with given locale
                            in the respective category (default taken from
                            environment)
  --no-locale               equivalent to --locale=C
  -A, --auth=METHOD         default authentication method for local connections
  -U, --username=NAME       database superuser name
  -W, --pwprompt            prompt for a password for the new superuser
  --pwfile=FILE             read password for the new superuser from file
  -?, --help                show this help, then exit
  -V, --version             output version information, then exit

Less commonly used options:
  -d, --debug               generate lots of debugging output
  -s, --show                show internal settings
  -L DIRECTORY              where to find the input files
  -n, --noclean             do not clean up after errors

If the data directory is not specified, the environment variable PGDATA
is used.

The following are the valid options for initdb:

-D,  -pgdata=]DATADIR

The directory that you wish to initialize a new database cluster within. If you do not specify a directory name, the command will look at the PGDATA environment variable. If you the PGDATA variable is not set the command will fail.

-E,  -encoding=ENCODING ]

The name of the multi-byte encoding type of the template database within this cluster. Whatever type you set here will become the default type for any databases you later create using this cluster. This is only relevant if you have enabled multi-byte encoding in PostgreSQL. During subsequent database creations you may specify a different encoding to use.

- -no-locale ]

Specify that the cluster should be created with locale C. In general this is best practice unless you have a specific purpose to explicitly set the locale.

Warning

The setting of locales other than C can cause queries that use LIKE to be unable to use an index. This option will also effect the use of ORDER BY and to_char family of functions.

- -lc-collate=locale, - -lc-ctype=locale, - -lc-messages=locale, - -lc-monetary=locale, - -lc-numeric=locale, - -lc-time=locale

Similar to - -locale but specific to particular needs such as a monetary or time format.

-U USERNAME | - -username=USERNAME

The superuser that will be created on initialization of the cluster. This defaults to the UID of the sytem user executing the command.

-W | - -pwprompt

Prompt for a password during cluster creation for the PostgreSQL superuser. It is always a good idea to use this option.

- -pwfile=filename

Read the superuser password from filename. In general just use -W

-d | - -debug

The debug switch, which causes debugging information from the creation of the catalog tables to be displayed.

-L directory

Normally not required but can be used to specify a different location of source files for initdb to use in creation of the cluster. If initdb needs this option, initdb will inform you upon execution.

-n | - -noclean

By default if initdb fails it will remove everything it created. If you specify this option it won't. This is a good option to use if you are trying to diagnose the failure of initdb to finish sucessfully.

If the command completes successfully, initdb will have created a database cluster in the specified data directory; this cluster can then be used by the backend to store its databases.

Example 2-22 initializes a new database cluster in the /usr/local/pgsql/booktown directory:

Example 2-22. Initializing a New Database Cluster

[postgres@booktown ~]$ initdb /usr/local/pgsql/booktown
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale C.

creating directory data ... ok
creating directory data/global ... ok
creating directory data/pg_xlog ... ok
creating directory data/pg_xlog/archive_status ... ok
creating directory data/pg_clog ... ok
creating directory data/pg_subtrans ... ok
creating directory data/pg_twophase ... ok
creating directory data/pg_multixact/members ... ok
creating directory data/pg_multixact/offsets ... ok
creating directory data/base ... ok
creating directory data/base/1 ... ok
creating directory data/pg_tblspc ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 1000
creating configuration files ... ok
creating template1 database in data/base/1 ... ok
initializing pg_authid ... ok
enabling unlimited row size for system tables ... ok
initializing dependencies ... ok
creating system views ... ok
loading pg_description ... ok
creating conversions ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the -A option the
next time you run initdb.

Success. You can now start the database server using:

    bin/postmaster -D data
or
    bin/pg_ctl -D data -l logfile start

2.5.2. Table spaces

If you are not interested in creating a new database cluster, but simply wish to store a particular database, table or index in a different physical location, use tablespaces.

Table spaces can be used to to spread your cluster across multiple storage providers. For example you may have your main system catalogs on your default partition. You would then place your main application database on a new RAID array that is mounted under /app/booktown. Lastly you could even have a third RAID that was dedicated to the most active tables within your database. The third RAID location may be mounted under /app/most_active.

Where your tablespaces exist by default is specified in but you can create arbitrary table spaces by using the CREATE TABLESPACE command.

Example 2-23.

      template1=# \h CREATE TABLESPACE
      Command:     CREATE TABLESPACE
      Description: define a new tablespace
      Syntax:
      CREATE TABLESPACE tablespacename [ OWNER username ] LOCATION 'directory'
      template1=#
      template1=# CREATE TABLESPACE books OWNER jd LOCATION '/tmp/books';
      CREATE TABLESPACE
      template1=#
      

Note

The directory that you choose for a table space must exist before you execute the CREATE TABLESPACE command and the user that PostgreSQL is running as must own the directory. The [ OWNER username ] is an option parameter that specifies the PostgreSQL user that will own the tablespace. The LOCATION is the absolute path to the location of the directory that will contain the table space.

Table Spaces exist to give PostgreSQL users greater flexibility in the physical design of their database and allow users easily deal with space and performance considerations as they arrive. For example if you have a very large table that is starting to cause capacity concerns you can create a new table space and move that table to the new table space using the ALTER TABLE command. You can also perform this feature on an index (e.g., Such as GIST indexes which can be quite large) to allow for quicker access to the index based on a faster RAID volume.