일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |
- apple
- MFA
- SwiftUI
- fido
- MSYS2
- FIDO2
- Xcode
- Nodejs
- git
- otpkey
- 인증
- 앨범북
- 2FA
- SSL
- albumbook
- SSH
- appres
- 앱스토어
- SWIFT
- WebAuthn
- Android
- openssl
- 애플
- 안드로이드
- 앱리소스
- kmip
- MYSQL
- css
- OTP
- OSX
- Today
- Total
인디노트
1.3. Creating a Postgres-XL cluster 본문
https://www.postgres-xl.org/documentation/tutorial-createcluster.html
1.3. Creating a Postgres-XL cluster
As mentioned in the architectural fundamentals, Postgres-XL is a collection of multiple components. It can be a bit of work to come up with your initial working setup. In this tutorial, we will show how one can start with an empty
configuration file and use the pgxc_ctl utility to create your Postgres-XL cluster from scratch.
A few pre-requisites are necessary on each node that is going to be a part of the Postgres-XL setup.
Password-less ssh access is required from the node that is going to run the pgxc_ctl utility.
The PATH environment variable should have the correct Postgres-XL binaries on all nodes, especially while running a command via ssh.
The
pg_hba.conf
entries must be updated to allow remote access. Variables likecoordPgHbaEntries
anddatanodePgHbaEntries
in thepgxc_ctl.conf
configuration file may need appropriate changes.Firewalls and iptables may need to be updated to allow access to ports.
The pgxc_ctl utility should be present in your PATH. If it is not there, it can be compiled from source.
$
cd $XLSRC/src/bin/pgxc_ctl$
make install
We are now ready to prepare our template configuration file. The pgxc_ctl utility allows you to create three types of configuration. We will choose the empty
configuration which will allow us to create our Postgres-XL setup from scratch. Note that we also need to set up the dataDirRoot
environment variable properly for all future invocations of pgxc_ctl.
$
export dataDirRoot=$HOME/DATA/pgxl/nodes$
mkdir $HOME/pgxc_ctl$
pgxc_ctl Installing pgxc_ctl_bash script as /Users/postgres/pgxc_ctl/pgxc_ctl_bash. Installing pgxc_ctl_bash script as /Users/postgres/pgxc_ctl/pgxc_ctl_bash. Reading configuration using /Users/postgres/pgxc_ctl/pgxc_ctl_bash --home /Users/postgres/pgxc_ctl --configuration /Users/postgres/pgxc_ctl/pgxc_ctl.conf Finished reading configuration. ******** PGXC_CTL START *************** Current directory: /Users/postgres/pgxc_ctlPGXC$
prepare config emptyPGXC$
exit
The empty
configuration file is now ready. You should now make changes to the pgxc_ctl.conf
. At a minimum, pgxcOwner
should be set correctly. The configuration file does contain USERi
and HOME
environment variables to allow easy defaults for the current user.
The next step is to add the GTM master to the setup.
$
pgxc_ctlPGXC$
add gtm master gtm localhost 20001 $dataDirRoot/gtm
Use the "monitor" command to check the status of the cluster.
$
pgxc_ctlPGXC$
monitor all Running: gtm master
Let us now add a couple of coordinators. When the first coordinator is added, it just starts up. When another coordinator is added, it connects to any existing coordinator node to fetch the metadata about objects.
PGXC$
add coordinator master coord1 localhost 30001 30011 $dataDirRoot/coord_master.1 none nonePGXC$
monitor all Running: gtm master Running: coordinator master coord1PGXC$
add coordinator master coord2 localhost 30002 30012 $dataDirRoot/coord_master.2 none nonePGXC$
monitor all Running: gtm master Running: coordinator master coord1 Running: coordinator master coord2
Moving on to the addition of a couple of datanodes, now. When the first datanode is added, it connects to any existing coordinator node to fetch global metadata. When a subsequent datanode is added, it connects to any existing datanode for the metadata.
PGXC$
add datanode master dn1 localhost 40001 40011 $dataDirRoot/dn_master.1 none none nonePGXC$
monitor all Running: gtm master Running: coordinator master coord1 Running: coordinator master coord2 Running: datanode master dn1PGXC$
add datanode master dn2 localhost 40002 40012 $dataDirRoot/dn_master.2 none none nonePGXC$
monitor all Running: gtm master Running: coordinator master coord1 Running: coordinator master coord2 Running: datanode master dn1 Running: datanode master dn2
Your Postgres-XL setup is ready now and you can move on to the next "Getting Started" topic.
Read on further, only if you want a quick crash course on the various commands you can try out with Postgres-XL. It is strongly recommended to go through the entire documentation for more details on each and every command that we will touch upon below.
Connect to one of the coordinators and create a test database.
$
psql -p 30001 postgres
postgres=# CREATE DATABASE testdb;
CREATE DATABASE
postgres=# \q
Look at pgxc_node catalog. It should show all the configured nodes. It is normal to have negative node id values. This will be fixed soon.
$
psql -p 30001 testdb
testdb=# SELECT * FROM pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-----------+-----------+-----------+-----------+----------------+------------------+-------------
coord1 | C | 30001 | localhost | f | f | 1885696643
coord2 | C | 30002 | localhost | f | f | -1197102633
dn1 | D | 40001 | localhost | t | t | -560021589
dn2 | D | 40002 | localhost | f | t | 352366662
(4 rows)
Let us now create a distributed table, distributed on first column by HASH.
testdb=# CREATE TABLE disttab(col1 int, col2 int, col3 text) DISTRIBUTE BY HASH(col1); CREATE TABLE testdb=# \d+ disttab Table "public.disttab" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+----------+--------------+------------- col1 | integer | | plain | | col2 | integer | | plain | | col3 | text | | extended | | Has OIDs: no Distribute By: HASH(col1) Location Nodes: ALL DATANODES
Also create a replicated table.
testdb=# CREATE TABLE repltab (col1 int, col2 int) DISTRIBUTE BY REPLICATION; CREATE TABLE testdb=# \d+ repltab Table "public.repltab" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+---------+--------------+------------- col1 | integer | | plain | | col2 | integer | | plain | | Has OIDs: no Distribute By: REPLICATION Location Nodes: ALL DATANODES
Now insert some sample data in these tables.
testdb=# INSERT INTO disttab SELECT generate_series(1,100), generate_series(101, 200), 'foo'; INSERT 0 100 testdb=# INSERT INTO repltab SELECT generate_series(1,100), generate_series(101, 200); INSERT 0 100
Ok. So the distributed table should have 100 rows
testdb=# SELECT count(*) FROM disttab; count ------- 100 (1 row)
And they must not be all on the same node. xc_node_id
is a system column which shows the originating datanode for each row. Note that the distribution can be slightly uneven because of the HASH function
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id; xc_node_id | count ------------+------- -560021589 | 42 352366662 | 58 (2 rows)
For replicated tables, we expect all rows to come from a single datanode (even though the other node has a copy too).
testdb=# SELECT count(*) FROM repltab; count ------- 100 (1 row) testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id; xc_node_id | count ------------+------- -560021589 | 100 (1 row)
Now add a new datanode to the cluster.
PGXC$
add datanode master dn3 localhost 40003 40013 $dataDirRoot/dn_master.3 none none nonePGXC$
monitor all Running: gtm master Running: coordinator master coord1 Running: coordinator master coord2 Running: datanode master dn1 Running: datanode master dn2 Running: datanode master dn3
Note that during cluster reconfiguration, all outstanding transactions are aborted and sessions are reset. So you would typically see errors like these on open sessions
testdb=# SELECT * FROM pgxc_node; ERROR: canceling statement due to user request <==== pgxc_pool_reload() resets all sessions and aborts all open transactions testdb=# SELECT * FROM pgxc_node; node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id -----------+-----------+-----------+-----------+----------------+------------------+------------- coord1 | C | 30001 | localhost | f | f | 1885696643 coord2 | C | 30002 | localhost | f | f | -1197102633 dn1 | D | 40001 | localhost | t | t | -560021589 dn2 | D | 40002 | localhost | f | t | 352366662 dn3 | D | 40003 | localhost | f | f | -700122826 (5 rows)
Note that with new datanode addition, Existing tables are not affected. The distribution information now explicitly shows the older datanodes
testdb=# \d+ disttab Table "public.disttab" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+----------+--------------+------------- col1 | integer | | plain | | col2 | integer | | plain | | col3 | text | | extended | | Has OIDs: no Distribute By: HASH(col1) Location Nodes: dn1, dn2 testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id; xc_node_id | count ------------+------- -560021589 | 42 352366662 | 58 (2 rows) testdb=# \d+ repltab Table "public.repltab" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+---------+--------------+------------- col1 | integer | | plain | | col2 | integer | | plain | | Has OIDs: no Distribute By: REPLICATION Location Nodes: dn1, dn2
Let us now try to redistribute tables so that they can take advantage of the new datanode
testdb=# ALTER TABLE disttab ADD NODE (dn3); ALTER TABLE testdb=# \d+ disttab Table "public.disttab" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+----------+--------------+------------- col1 | integer | | plain | | col2 | integer | | plain | | col3 | text | | extended | | Has OIDs: no Distribute By: HASH(col1) Location Nodes: ALL DATANODES testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id; xc_node_id | count ------------+------- -700122826 | 32 352366662 | 32 -560021589 | 36 (3 rows)
Let us now add a third coordinator.
PGXC$
add coordinator master coord3 localhost 30003 30013 $dataDirRoot/coord_master.3 none nonePGXC$
monitor all Running: gtm master Running: coordinator master coord1 Running: coordinator master coord2 Running: coordinator master coord3 Running: datanode master dn1 Running: datanode master dn2 Running: datanode master dn3 testdb=# SELECT * FROM pgxc_node; ERROR: canceling statement due to user request testdb=# SELECT * FROM pgxc_node; node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id -----------+-----------+-----------+-----------+----------------+------------------+------------- coord1 | C | 30001 | localhost | f | f | 1885696643 coord2 | C | 30002 | localhost | f | f | -1197102633 dn1 | D | 40001 | localhost | t | t | -560021589 dn2 | D | 40002 | localhost | f | t | 352366662 dn3 | D | 40003 | localhost | f | f | -700122826 coord3 | C | 30003 | localhost | f | f | 1638403545 (6 rows)
We can try some more ALTER TABLE so as to delete a node from a table distribution and add it back
testdb=# ALTER TABLE disttab DELETE NODE (dn1); ALTER TABLE testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id; xc_node_id | count ------------+------- 352366662 | 42 -700122826 | 58 (2 rows) testdb=# ALTER TABLE disttab ADD NODE (dn1); ALTER TABLE testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id; xc_node_id | count ------------+------- -700122826 | 32 352366662 | 32 -560021589 | 36 (3 rows)
You could also alter a replicated table to make it a distributed table. Note that even though the cluster has 3 datanodes now, the table will continue to use only 2 datanodes where the table was originally replicated on.
testdb=# ALTER TABLE repltab DISTRIBUTE BY HASH(col1); ALTER TABLE testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id; xc_node_id | count ------------+------- -560021589 | 42 352366662 | 58 (2 rows) testdb=# ALTER TABLE repltab DISTRIBUTE BY REPLICATION; ALTER TABLE testdb=# SELECT xc_node_id, count(*) FROM repltab GROUP BY xc_node_id; xc_node_id | count ------------+------- -560021589 | 100 (1 row)
Remove the coordinator added previously now. You can use the "clean" option to remove the corresponding data directory as well.
PGXC$
remove coordinator master coord3 cleanPGXC$
monitor all Running: gtm master Running: coordinator master coord1 Running: coordinator master coord2 Running: datanode master dn1 Running: datanode master dn2 Running: datanode master dn3 testdb=# SELECT oid, * FROM pgxc_node; ERROR: canceling statement due to user request testdb=# SELECT oid, * FROM pgxc_node; oid | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id -------+-----------+-----------+-----------+-----------+----------------+------------------+------------- 11197 | coord1 | C | 30001 | localhost | f | f | 1885696643 16384 | coord2 | C | 30002 | localhost | f | f | -1197102633 16385 | dn1 | D | 40001 | localhost | t | t | -560021589 16386 | dn2 | D | 40002 | localhost | f | t | 352366662 16397 | dn3 | D | 40003 | localhost | f | f | -700122826 (5 rows)
Let us try to remove a datanode now. NOTE: Postgres-XL does not employ any additional checks to ascertain if the datanode being dropped has data from tables that are replicated/distributed. It is the responsibility of the user to ensure that it's safe to remove a datanode. You can use the below query to find out if the datanode being removed has any data on it. Do note that this only shows tables from the current database. You might want to ensure the same for all databases before going ahead with the datanode removal. Use the OID of the datanode that is to be removed in the below query:
testdb=# SELECT * FROM pgxc_class WHERE nodeoids::integer[] @> ARRAY[16397]; pcrelid | pclocatortype | pcattnum | pchashalgorithm | pchashbuckets | nodeoids ---------+---------------+----------+-----------------+---------------+------------------- 16388 | H | 1 | 1 | 4096 | 16385 16386 16397 (1 row) testdb=# ALTER TABLE disttab DELETE NODE (dn3); ALTER TABLE testdb=# SELECT * FROM pgxc_class WHERE nodeoids::integer[] @> ARRAY[16397]; pcrelid | pclocatortype | pcattnum | pchashalgorithm | pchashbuckets | nodeoids ---------+---------------+----------+-----------------+---------------+---------- (0 rows)
Ok, it is safe to remove datanode "dn3" now.
PGXC$
remove datanode master dn3 cleanPGXC$
monitor all Running: gtm master Running: coordinator master coord1 Running: coordinator master coord2 Running: datanode master dn1 Running: datanode master dn2 testdb=# SELECT oid, * FROM pgxc_node; ERROR: canceling statement due to user request testdb=# SELECT oid, * FROM pgxc_node; oid | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id -------+-----------+-----------+-----------+-----------+----------------+------------------+------------- 11197 | coord1 | C | 30001 | localhost | f | f | 1885696643 16384 | coord2 | C | 30002 | localhost | f | f | -1197102633 16385 | dn1 | D | 40001 | localhost | t | t | -560021589 16386 | dn2 | D | 40002 | localhost | f | t | 352366662 (4 rows)
The pgxc_ctl utility can also help in setting up slaves for datanodes and coordinators. Let us setup a slave for a datanode and see how failover can be performed in case the master datanode goes down.
PGXC$
add datanode slave dn1 localhost 40101 40111 $dataDirRoot/dn_slave.1 none $dataDirRoot/datanode_archlog.1PGXC$
monitor all Running: gtm master Running: coordinator master coord1 Running: coordinator master coord2 Running: datanode master dn1 Running: datanode slave dn1 Running: datanode master dn2 testdb=# EXECUTE DIRECT ON(dn1) 'SELECT client_hostname, state, sync_state FROM pg_stat_replication'; client_hostname | state | sync_state -----------------+-----------+------------ | streaming | async (1 row)
Add some more rows to test failover now.
testdb=# INSERT INTO disttab SELECT generate_series(1001,1100), generate_series(1101, 1200), 'foo'; INSERT 0 100 testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id; xc_node_id | count ------------+------- -560021589 | 94 352366662 | 106 (2 rows)
Let us simulate datanode failover now. We will first stop the datanode master "dn1" for which we configured a slave above. Note that since the slave is connected to the master we will use "immediate" mode for stopping it.
PGXC$
stop -m immediate datanode master dn1
Since a datanode is down, queries will fail. Though a few queries may still work if the failed node is not required to run the query, and that is determined by the distribution of the data and the WHERE clause being used.
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id; ERROR: Failed to get pooled connections testdb=# SELECT xc_node_id, * FROM disttab WHERE col1 = 3; xc_node_id | col1 | col2 | col3 ------------+------+------+------ 352366662 | 3 | 103 | foo (1 row)
We will now perform the failover and check that everything is working fine post that.
PGXC$
failover datanode dn1
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
ERROR: canceling statement due to user request
testdb=# SELECT xc_node_id, count(*) FROM disttab GROUP BY xc_node_id;
xc_node_id | count
------------+-------
-560021589 | 94
352366662 | 106
(2 rows)
The pgxc_node catalog now should have updated entries. Especially, the failed over datanode node_host and node_port should have been replaced with the slave's host and port values.
testdb=# SELECT oid, * FROM pgxc_node;
oid | node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
-------+-----------+-----------+-----------+-----------+----------------+------------------+-------------
11197 | coord1 | C | 30001 | localhost | f | f | 1885696643
16384 | coord2 | C | 30002 | localhost | f | f | -1197102633
16386 | dn2 | D | 40002 | localhost | f | t | 352366662
16385 | dn1 | D | 40101 | localhost | t | t | -560021589
(4 rows)
PGXC$
monitor all
Running: gtm master
Running: coordinator master coord1
Running: coordinator master coord2
Running: datanode master dn1
Running: datanode master dn2
'개발 플랫폼 및 언어' 카테고리의 다른 글
[bash: split, cat] 파일 분할/재조립 (0) | 2019.01.10 |
---|---|
CentOS 7 오래된 커널 손쉽게 지우기 (0) | 2018.12.31 |
윈도우즈기반 도커에 Postgres-XL 설치하기 (0) | 2018.12.13 |
Set a Static IP Address in VMware Fusion 7 (0) | 2018.12.13 |
[CentOS] PostgreSQL 설치 (0) | 2018.12.12 |