Citus Configuration for PostgreSQL sharding

Welcome back to our Citus blog series! Previously, we introduced Citus, reviewed use cases, reviewed Citus structure, and installed Citus alongside PostgreSQL 14. This post will focus on configuring settings to make sure Citus is communicating between all instances.

Multi-Node Configuration

Now that we have Citus and PostgreSQL installed per our previous blog post, let’s configure our servers to communicate with one another and transfer data. The following steps should be repeated on the coordinator node and each worker node. Let’s begin by configuring postgresql.conf:

sudo vi /etc/postgresql/14/main/postgresql.conf

First, we will instruct the database server to listen on all IP interfaces, not just the localhost:

listen_addresses = '*'

Since Citus uses logical replication to move shards, we set the wal_level to logical:

wal_level = logical

We need to preload the Citus extension. Find the line with shared_preload_libraries, and modify it to look like so:

shared_preload_libraries = 'citus'

We have successfully set up configuration for PostgreSQL server. Next, let’s edit the host-based authentication file to allow access to the cluster. We can open the file in a text editor with:

sudo vi /etc/postgresql/14/mian/pg_hba.conf

Once we have the file open in the editor, scroll down to the bottom and make sure the following lines are present to ensure PostgreSQL is able to communicate between all local and private IP addresses:

host all all 10.0.0.0/8 scram-sha-256
host all all 127.0.0.1/32 scram-sha-256
host all all ::1/128 scram-sha-256

If your network does not match the 10.0.0.0/8 subnet mask, then you can replace that line with the following line to match any address in any subnet that the server is connected to directly:

host all all samenet scram-sha-256

Now that we have made these changes to postgresql.conf and pg_hba.conf, it is important for us to restart PostgreSQL in order for the changes to take effect. We can do so with:

sudo systemctl restart postgresql

Next up is adding the Citus extension to every database that will be used in the cluster. We’ll begin by creating the demo database:

sudo -iu postgres psql -c "CREATE DATABASE demo;"

Next, we add the Citus extension to the demo database:

sudo -iu postgres psql -d demo -c "CREATE EXTENSION citus;"

We can confirm that Citus has been loaded into our demo database with:

sudo -iu postgres psql -d demo -c "SELECT * FROM pg_extension;"

And it should print out something like this:

oid | extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
-------+----------------+----------+--------------+----------------+------------+-----------+--------------
13781 | plpgsql | 10 | 11 | f | 1.0 | |
16385 | citus_columnar | 10 | 11 | f | 11.1-1 | |
16441 | citus | 10 | 11 | f | 11.1-1 | |
(3 rows)

The rows with citus and citus_columnar that have extversion 11.1-1 confirm that we have successfully loaded into our demo database.

Setting Up .pgpass

To increase security for our replication process, we will store PostgreSQL user passwords in the .pgpass file. We want this file to be located in the postgres user’s home directory, which is /var/lib/postgresql in Debian-based systems and /var/lib/psql in Red Hat systems. We want to become the postgres user for the following series of commands:

sudo -iu postgres

Before we store our password in the .pgpass file, we shall first set our password. Launch psql:

psql

Now, set the postgres password using the password meta-command:

\password

This prompts you to enter a password and repeat to confirm. We are now going to exit psql and return to the postgres user:

exit

Let’s create the .pgpass file with:

touch .pgpass

Since we created the .pgpass file with the postgres user, we ensure postgres owns the file. For security purposes, we will restrict access to the file so only the postgres user can write on and read the file. We can change the ownership and change the mode of the .pgpass file with the following command:

chmod 600 .pgpass

Now that its data is properly protected, let’s open .pgpass with:

vi .pgpass

You will see that the lines in this file follow the format:

hostname:port:database:username:password

For increased security levels, you can be specific and explicitly name the hostname, port, and database. When using more specific parameters, it is important to include a line in .pgpass for each instance with unique parameters to ensure proper communication. For the simplicity of this tutorial, we will use wildcards (*) in the hostname, port, and database fields. The wildcards indicate that the username and password will be applicable to any hostname, port, or database to which the user has access. We set the username to the postgres user, and we set the password to postgres, since that is what we set it to earlier in this step. We are able to do this since we have set the postgres user password to ‘postgres’ on each instance we will be accessing. If you have set a different password for the postgres user on any instance, that instance will need its own line in the .pgpass file with specific parameters. Let’s add this line to .pgpass:

*:*:*:postgres:postgres

Now let’s save and exit the file.

Coordinator Node Configuration

In addition to the previous steps we performed for all nodes, we need to include some important information on the coordinator node so it can properly communicate with the worker nodes. As the postgres user, launch psql and move to the demo database:

psql demo

Let’s begin this portion by setting the coordinator host with:

SELECT citus_set_coordinator_host('10.1.1.1', 5432);

Next, we will add the worker nodes from the coordinator:

SELECT * from citus_add_node('10.2.2.2', 5432);
SELECT * from citus_add_node('10.3.3.3', 5432);

Let’s verify that worker configuration has been set correctly with:

SELECT * FROM citus_get_active_worker_nodes();

The print out should read something like:

node_name | node_port
-------------+-----------
10.2.2.2 | 5432
10.3.3.3 | 5432
(2 rows)

Summary

We have covered multi-node Citus configuration in postgresql.conf, pg_hba.conf, .pgpass, and node assignment. We reviewed configuration settings that need to be set for all instances where we will be using Citus, as well as the assignment of nodes from the coordinator. The next blog post in this series will cover distributing data across the Citus nodes.