PostgreSQL: The definitive guide
Prev		Next

Chapter 2. Getting Started

This chapter focuses on the requirements and steps involved in installing and configuring PostgreSQL. Many of the PostgreSQL capabilities are not enabled, by default. For example, support for the TCL language is a feature that must be explicitly requested during compile-time. As there are many other features that are not configured by default, we will cover the various flags and options you may use to enable them when compiling PostgreSQL. It is important that you carefully read through all the steps in this process before beginning installation.

This chapter will walk you through the installation steps on a Linux/UNIX-style platform. Our installation platform is Linux, but these instructions should be compatible with most current UNIX platforms.

As of version 8.x PostgreSQL will run natively on the Win32 platform. This book will cover how to install PostgreSQL for Win32 using the community and Mammoth installers. For source installation we will use Linux as the base platform.

2.1. Choosing the right hardware

It is important to consider what PostgreSQL will be doing for you when selecting hardware. If you are just a hobbyist or single developer doing testing and standard PC with a reasonable amount of ram (512 Megabytes) should do.

If you are deploying for a production application it is imperative that you pick the right hardware from the beginning. A bad choice in hardware can cost you in terms of performance and reliability when your application goes live. Databases typically suffer first from a lack of I/O to the hard drives, lack of ram and then lack of processing power.

2.1.1. Choosing the right CPU

Picking the correct hardware can sometimes be a bit of a religious war. You have people from all sides including Intel is king, AMD Rocks, and Power5 is the only real CPU crowd. Regadless, the real question is what hardware will work for you.

Intel and AMD CPUs both perform very well with PostgreSQL. It also known to work on almost every other processor in wide spread use including Sparc, Power, PowerPC, MIPS, and PA-RISC. When using an x86 processor the AMD Opteron processors are generally considered better for use with PostgreSQL. This is due to the on-die memory channels and excellent memory bandwidth.

When selecting a server for PostgreSQL it is important to keep in mind that PostgreSQL can only execute a single query per cpu at a time. Thus if you are looking to purchase new hardware it is always good to look for either a SMP machine or dual-core design that allows multiple executions. Hyperthreaded machines may help with this as well.

2.1.2. How much physical memory do I need?

This is a tough question. The size of your database, the amount of connections, the number of transactions that are going to be executed are all factors in deciding how much ram you are going to need. Small databases with only a few users can easily run in 256 megs of ram. Larger databases that are using 500 connections are going to need multiple Gigabytes of ram. Later in Chapter 3 we will be discussing optimization and how to decide how much ram you need.

2.1.3. Choosing the right hard drive system

SCSI offers excellent performance, good reliability and typically a long life. SCSI also tends to be better in highly concurrent environments. However SCSI can be cost prohibitive for some users.

SATA offers reasonable performance, decent reliability, and typically a long life. SATA tends to be better in space constrained environments. However due to the nature of SATA and specifically write-back cache it is not wise to use with a database unless you have a battery backed raid controller.

2.1.4. RAID types

There are many different types of RAID. The most common RAID levels are 0,1,5,10. Other popular levels are 50 and 60. Each version offers pros and cons and all except RAID 0 are suitable for use with PostgreSQL.

RAID 0: RAID 0 also known as striped offers excellent read and write performance at the cost of reliability. When using RAID 0 if any drive within the array fails you will loose all of your data. RAID 0 requires at least two volumes.
RAID 1: RAID 1 also known as mirrored and duplexing offers good read performance but has a write penalty as all the data must be written to each volume to keep the mirror consistent. When using RAID 1 you will have 50% capacity across volumes. In other words, when using RAID 1 if you have 2 100 Gigabyte drives, the total capacity will be 100 Gigabytes instead of 200 Gigabytes.
RAID 5: RAID 5 also known as block interleaved distributed parity offers excellent read performance but the wite performance is limited. At least three volumes are required to create a RAID 5 array. When using RAID 5 you can loose any single volume per three volumes within the array and continue to operate. To calculate capacity with a RAID 5, use the total capacity of two of the three volumes.
RAID 10: RAID 10 also know as striped mirrors offers excellent read and write preformance at the cost of capacity. At least four volumes are required to create a RAID 10 array. When using RAID 10 you have the same redundancy level as RAID 1 with the added performance benefit of striping. To calculate capacity with RAID 10, use the total capacity of all volumes divided by two. RAID 10 is generally considered the best RAID level to implement for databases. However the extra volume requirements will increase the total cost of the array.

Almost all transactional databases suffer from lack of hard drive bandwidth first, lack of ram second. When in doubt, add as many hard drives to a RAID array as you can budget.

Prev	Home	Next
Installation, Configuration and Management	Up	Preparing for Source Installation