Using a .json File to Define and Deploy a Remote Cluster

In this tutorial, we'll walk you through the process of using pgEdge Platform on a console host to create and modify a .json file that specifies your cluster details, and then deploy that cluster on remote hosts. The deployment validates the SSH connection with each node, creates the replication sets and subscriptions, and ensures that communication between the nodes works as expected.

Create a Cluster Configuration File

After installing the pgEdge Platform, use the pgedge cluster json-create command to create a .JSON file that describes the configuration of your replication cluster. Move into the pgedge directory, and invoke the command:

./pgedge cluster json-create cluster_name db_name node_count db_superuser password

The cluster json-create command will run a script prompting you for details about each cluster node; provide detailed information about your cluster as documented here.

Validate the .json File

You can use the pgedge cluster json-validate command to verify that your .JSON file is structured properly. The syntax is:

    ./pgedge cluster json-validate cluster_name

Note: The validate command checks only the structure of the .json file; it does not validate the values you provide.

Deploy the Remote Cluster

After updating and validating the cluster_name.json file, you can invoke the pgedge cluster init command on the console host to deploy the defined cluster. The syntax is:

    ./pgedge cluster init cluster_name

The command installs pgEdge Platform and PostgreSQL on each node, creates the initial database, installs the spock and snowflake extensions, and creates the default replication set and subscriptions. For example, the following command deploys the sample demo.json file created above:

./pgedge cluster init demo

The CLI displays confirmation of node and subscription creation as it deploys the cluster.

## Loading cluster 'demo' json definition file
July 11, 2024, 14:56:04: 127.0.0.1 : n1 - Checking ssh on 127.0.0.1                                 [OK]
July 11, 2024, 14:56:05: 127.0.0.1 : n2 - Checking ssh on 127.0.0.1                                 [OK]
July 11, 2024, 14:56:05: 127.0.0.1 : n3 - Checking ssh on 127.0.0.1                                 [OK]
 
 
################################################################################
#              REPO: https://pgedge-upstream.s3.amazonaws.com/REPO
#      Cluster Name: demo
#         Node Type: Primary
# Primary Node Name:  
#              Name: n1
#              Host: 127.0.0.1
#              Port: 6432
#              Path: /home/ec2-user/work/platform_test/nc/pgedge/cluster/demo/n1
#          Database: lcdb
#     Database User: lcusr
#   Number of Nodes: 3
################################################################################
July 11, 2024, 14:56:15: 127.0.0.1 : n1 - Installing pgedge                                         [OK]
July 11, 2024, 14:57:03: 127.0.0.1 : n1 - Setup pgedge                                              [OK]
 
 
################################################################################
#              REPO: https://pgedge-upstream.s3.amazonaws.com/REPO
#      Cluster Name: demo
#         Node Type: Primary
# Primary Node Name:  
#              Name: n2
#              Host: 127.0.0.1
#              Port: 6433
#              Path: /home/ec2-user/work/platform_test/nc/pgedge/cluster/demo/n2
#          Database: lcdb
#     Database User: lcusr
#   Number of Nodes: 3
################################################################################
July 11, 2024, 14:57:22: 127.0.0.1 : n2 - Installing pgedge                                         [OK]
July 11, 2024, 14:58:10: 127.0.0.1 : n2 - Setup pgedge                                              [OK]
 
 
################################################################################
#              REPO: https://pgedge-upstream.s3.amazonaws.com/REPO
#      Cluster Name: demo
#         Node Type: Primary
# Primary Node Name:  
#              Name: n3
#              Host: 127.0.0.1
#              Port: 6434
#              Path: /home/ec2-user/work/platform_test/nc/pgedge/cluster/demo/n3
#          Database: lcdb
#     Database User: lcusr
#   Number of Nodes: 3
################################################################################
July 11, 2024, 14:58:29: 127.0.0.1 : n3 - Installing pgedge                                         [OK]
July 11, 2024, 14:59:18: 127.0.0.1 : n3 - Setup pgedge                                              [OK]
July 11, 2024, 14:59:21: 127.0.0.1 : n1 - Creating node n1                                          [OK]
July 11, 2024, 14:59:24: 127.0.0.1 : n2 - Creating node n2                                          [OK]
July 11, 2024, 14:59:26: 127.0.0.1 : n3 - Creating node n3                                          [OK]
July 11, 2024, 14:59:28: 127.0.0.1 : n1 - Creating subscriptions sub_n1n2                           [OK]
July 11, 2024, 14:59:29: 127.0.0.1 : n1 - Creating subscriptions sub_n1n3                           [OK]
July 11, 2024, 14:59:30: 127.0.0.1 : n2 - Creating subscriptions sub_n2n1                           [OK]
July 11, 2024, 14:59:31: 127.0.0.1 : n2 - Creating subscriptions sub_n2n3                           [OK]
July 11, 2024, 14:59:32: 127.0.0.1 : n3 - Creating subscriptions sub_n3n1                           [OK]
July 11, 2024, 14:59:33: 127.0.0.1 : n3 - Creating subscriptions sub_n3n2                           [OK]
July 11, 2024, 14:59:34: 127.0.0.1 : n3 - Listing spock nodes                                       [OK]
 
 
[
  {
    "node_id": 673694252,
    "node_name": "n1"
  },
  {
    "node_id": 560818415,
    "node_name": "n2"
  },
  {
    "node_id": 193995617,
    "node_name": "n3"
  }
]

When cluster deployment completes, you can create one or more tables in the database, and use the pgedge spock repset-add-table to add the table to the default replication set to start replication.

Add Tables to the default Replication Set

For replication to begin, you will need to add tables to the default replication set; for this example, we'll use pgbench to create those tables. When you open pgbench or psql, specify the port number and database name to ensure you're working on the correct node. Before starting, source the PostgreSQL environment variables on each node to add pgbench and psql to your OS PATH; for example:

source cluster/demo/n1/pgedge/pg16/pg16.env

Then, use pgbench to set up a very simple four-table database. At the OS command line, create the pgbench tables in your database (db_name) with the pgbench command. You must create the tables on each node in your replication cluster:

pgbench -i --port=port_number db_name

Then, connect to each node with the psql client:

psql -p port_number -d db_name

Once connected, alter the numeric columns, making the numeric fields conflict-free delta-apply columns, ensuring that the value replicated is the delta of the committed changes (the old value plus or minus any new value) to a given record. If your cluster is configured to use Spock extension 4.0 (or later) use the commands:

ALTER TABLE pgbench_accounts ALTER COLUMN abalance SET (log_old_value=true, delta_apply_function=spock.delta_apply);
ALTER TABLE pgbench_branches ALTER COLUMN bbalance SET (log_old_value=true, delta_apply_function=spock.delta_apply);
ALTER TABLE pgbench_tellers ALTER COLUMN tbalance SET (log_old_value=true, delta_apply_function=spock.delta_apply);

If you're using an older version of the Spock extention (prior to 4.0), use the commands:

ALTER TABLE pgbench_accounts ALTER COLUMN abalance SET (log_old_value=true);
ALTER TABLE pgbench_branches ALTER COLUMN bbalance SET (log_old_value=true);
ALTER TABLE pgbench_tellers ALTER COLUMN tbalance SET (log_old_value=true);

Then, exit psql:

db_name=# \q

On the OS command line for each node, use the pgedge spock repset-add-table command to add the tables to the system-created replication set (named default); the command is followed by your database name (db_name):

./pgedge spock repset-add-table default 'pgbench_*' db_name
Adding table public.pgbench_accounts to replication set default.
Adding table public.pgbench_branches to replication set default.
Adding table public.pgbench_history to replication set default.
table pgbench_history cannot be added to replication set default
⚠ DETAIL:  table does not have PRIMARY KEY and given replication set is configured to replicate UPDATEs and/or DELETEs
Adding table public.pgbench_tellers to replication set default.

The fourth table, pgbench_history, is excluded from the replication set because it does not have a primary key.

Check the Configuration

On the psql command line, check the configuration on each node with the following SQL statements:

db_name=# SELECT * FROM spock.node;
  node_id  | node_name | location | country | info 
-----------+-----------+----------+---------+------
 193995617 | n3        |          |         | 
 673694252 | n1        |          |         | 
 560818415 | n2        |          |         | 
(3 rows)
 

and:

db_name=# SELECT sub_id, sub_name, sub_slot_name, sub_replication_sets  FROM spock.subscription;
   sub_id   | sub_name |      sub_slot_name       |         sub_replication_sets          
------------+----------+--------------------------+---------------------------------------
 3293941396 | sub_n1n2 | spk_lcdb_n2_sub_n1n2 | {default,default_insert_only,ddl_sql}
 1919895280 | sub_n1n3 | spk_lcdb_n3_sub_n1n3 | {default,default_insert_only,ddl_sql}
(2 rows)

The sub_replication_sets column shown above displays the system-created replication sets. You can add custom replication sets with the spock repset-create and spock sub-add-repset commands.

Test Replication

Now, if you update a row on n1, you should see the update to the same row on n2 and n3.

On n1:

db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
 tid | bid | tbalance | filler
-----+-----+----------+--------
   1 |   1 |    	0 |
 (1 row)
db_name=# UPDATE pgbench_tellers SET filler = 'test' WHERE tid = 1;
UPDATE 1

Check n2 and n3:

db_name=# SELECT * FROM pgbench_tellers WHERE tid = 1;
 tid | bid | tbalance | filler  	 
-----+-----+----------+--------------------------------------------------
   1 |   1 |    	0 | test                               
(1 row)

You can also use pgbench to exercise replication; exit psql:

db_name=# exit

Then, run the following command on all nodes at the same time to run pgbench for one minute.

pgbench -R 100 -T 60 -n --port=port_number db_name

When you connect with psql and check the results on both nodes, you'll see that the sum of the tbalance columns match on both pgbench_tellers tables. Without the conflict-free delta-apply columns, each conflict would have resulted in accepting the first in, potentially leading to sums that do not match between nodes.

n1:

db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -101244
(1 row)

n2:

db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -101244
(1 row)

n3:

db_name=# SELECT SUM(tbalance) FROM pgbench_tellers;
  sum  |
 ------+
 -101244
(1 row)