인디노트

Postgresql 10 Logical replication, Pgpool-II 본문

개발 플랫폼 및 언어/DB 기술

Postgresql 10 Logical replication, Pgpool-II

인디개발자 2019. 1. 10. 11:09

시작 전

PgPool을 처음 접하는 거라면 이글은 필요없고 여기부터 시작해서 차근차근 정독하는게 하는게 좋다.

진심이다.. 이 글 보다 저 글을 읽자..

그럼 이 글은 왜 썼냐고? 누가.. 쓰.. 라.. 읍.. 읍… 읍…

상황

Kibana도 AWS Athena도 사용중이긴 하지만, 대부분 OLAP을 위한 DB로 PostgreSQL을 사용중이다.

28억건쯤 들어있고, 하루에 1천만건 가량 row가 쌓이고 3.5TB쯤 되는 용량이다. 크고 느리고 무겁고 그지같다.

Citus로 마이그레이션하면 좋겠지만, 일단 Scale-up해야 하는 상황이다. 야호~ 신난다.

만약을 위한 HA구성도 따로 해야한다. 하지만 HA구성은 아직 해보지 않은 상황!

뭘 골라야 하나?

어떤 방식을 사용할지 개념잡기엔 공식사이트에 복제 관련 솔루션들 비교 페이지가 좋겠다.

아래는 솔루션이 갖는 특징들 목록이다. 아마도 모든 특징은 아니겠지만 주로 언급되고 사용된다. 어떤 솔루션들을 선택했을때 하나 이상의 특징을 갖는다 할 수 있겠다.

Shared Disk Failover

공유 디스크를 이용해 Failover를 하는건 디스크가 하나인데 여러개의 Postgres 인스턴스를 사용하겠다는 이야기다. 그러니 복제에 드는 비용이 없다

주 서버가 망가지면 대기하던 서버가 해당 디스크를 마운트해서 사용하기 때문에 데이터를 잃는데에 대한 부담도 없고 빠르게 Failover할 수 있다.

당연히 디스크가 하나라 이게 망가지면 끝이다.

그리고 Full POSIX behavor해야 한다는데 자세한건 여기를참고 하자.

또 동시에 인스턴스를 두개 돌리면 서로 싸우다가 망가지니 주의해야된단다.

File System (Block Device) Replication

파일 시스템 레벨의 복제다.

서로 다른 시스템을 미러링하는데 Distributed Replication Block Device 같은걸 이야기한다.

Write-Ahead Log Shipping

WAL(트랜잭션 로그)를 스트리밍이나 파일 기반으로 전달해서 대기 서버를 동기화 할 수 있다.

대기중인 서버를 핫 스탠바이로 만들어 추가적으로 활용 할 수 있다.

Logical Replication

논리 복제는 변경된 데이터를 스트림으로 다른 서버로 보낸다.

WAL(트랜잭션 로그)로 논리적인 데이터 변경을 스트림으로 만든다. 테이블 단위로 리플리케이션 될 수도 있다.

Logical replication doesn’t require a particular server to be designated as a master or a replica but allows data to flow in multiple directions.

또, 마스터나 리플리카를 구분해서 디자인된게 아니라 양방향으로 스트림될 수 있다.

Trigger-Based Master-Standby Replication

데이터 변경을 주 서버가 받아서 처리하고 비동기적으로 복제 서버에 보낸다. 비동기 복제는 주 서버와 복제 서버간의 데이터가 다를 수 있다고 생각해야 한다.

테이블 단위로 컨트롤 하거나, 핫 스탠바이처럼 복제 서버를 활용 할 수 있고 여러개의 복제 서버를 둘 수 있다.

Sloy-I가 Trigger-Based로 작동 할 수 있다.

State-Based Replication Middleware

Query 구문을 중간에 가로챈다. 미들웨어에서 쓰기를 모든 인스턴스에 적용하고 읽기는 하나만 필요하니 한군데서만 가져다 쓰기도한다.

복제를 Postgresql에 맡기고 주 서버에만 쓰기를 하기도 한다. Pgpool-II나 Continuent Tungsten같은 놈들이다.

Asynchronous Multimaster Replication

비동기 멀티 마스터 복제는 시간차를 두고 서로 통신하게 된다. 데이터가 충돌이 나면 사용자나, 룰을통해 문제를 해결한다.

Bucardo같은 놈이다.

Synchronous Multimaster Replication

동기 멀티 마스터 복제는 쓰기 속도를 희생해서 모든 서버가 OK할때 까지 기다린다. 하지만 데이터 정합성이 좋아진다.

Commercial Solutions

상업 솔루션들은 보통 이런것들을 언급하는것 같다. Postgresql를 주름잡는 EDB와 2ndQuadrant.

아래는 HIGH-AVAILABILITY-MATRIX의 매트릭스와 같은 내용이다.

기능Shared Disk FailoverFile System ReplicationWrite-Ahead Log ShippingLogical ReplicationTrigger-Based Mster-Standby ReplicationState-Based Replication MiddlewareAsynchronous Multimaster ReplicationSynchronous Multimaster Replication
일반적인 구현체NASDRBDPostgresql의 기본 스트리밍 복제Postgresql의 기본 논리 복제, pglogicalLundiste, SlonyPgpool-IIBucardo
통신 방법shared diskdisk blocksWALlogical decodingtable rowsSQLtable rowstable rows and row locks
추가 하드웨어 필요한가?OXXXXXXX
멀티 마스터 가능한가?XXXOXOOO
마스터 서버에 오버헤드가 없나?OXOOXOXX
멀티 마스터를 위해 동기할 시간이 필요한가?OXsync 옵션 끄면 됨sync 옵션 끄면됨OXOX
마스터가 망가졌을때 데이터 유실이 안되게 할 수 있나?OOsync 옵션 키면 됨sync 옵션 키면 됨XOXO
슬레이브 읽기 전용 쿼리 가능한가?XXhot stanby 설정 키면 됨OOOOO
테이블별로 복제 가능한가?XXXOOXOO
충돌났을때 해결할 필요가 없나?OOOXOXXO

고르기

가능하면 복제는 Postgresql의 native 복제를 쓰고 싶다.

이번 10에서 추가된 Logical replication이 추가되었으니 가능하면 이걸로.

Hot standby설정을 해서 슬레이브도 읽기 전용 쿼리를 받아주도록 하게 하고 이를 로드밸런싱하고 싶다.

그리고 Failover 기능도 있어야 한다.

  • Slony는 Trigger 기반 복제니 탈락.
  • Repmgr은 미들웨어가 아니라 로드밸런싱을 어플리케이션에서 해줘야 하니 탈락.
  • pgBouncer는 커넥션 풀링만 해주니 이것도 안녕.
  • PgPool-II는 원하는 기능이 모두 있다.

Google trends로 보여지는것도 그리 나쁜 생각은 아닌것 같다.

compre-pg-middleware

 

Pgpool-II

pgpool-II는 connection pool을 하는 Tatsuo Ishii의 개인 프로젝트로 시작해서 기능들이 늘어가면서 지금은 Pgpool development Group의 프로젝트까지 왔다.

처음 시작이 pool 기능만 있어서 pgpool로 출발했다고 한다. 지금은 pgpool-II다.

Pgpool-II가 제공하는 기능은 크게 5로 나와있다.

  • Connection Pooling, PostgreSQL은 MySQL에 비해 연결 오버헤드가 있다고 하던데, 이런 pooling을 통해 연결을 재사용해 오버헤드를 줄일 수 있다.
  • Load Balancing, 복제되는 서버가 있다면 SELECT 쿼리를 복제 서버들에 보내 전체적인 성능 향상을 꾀 할 수 있다. 당연히 쓰기 성능이 올라가진 않는다.
  • Automated fail over, 문제가 생긴 PostgreSQL 서버를 제거해 장애 상황을 피하게 할 수 있다.
  • Replication, Pgpool-II의 복제 기능(Native replication)이나 PostgreSQL의 streaming replication을 사용 할 수 있다.
  • Limiting Exceeding Connections, 과도하 연결에 대해 제어 할 수있다.

그리고 Pgpool Admin을 통해 브라우저로 현재 상태를 아래처럼 볼 수 있다.

pgpooladmin

Pgpool-II가 제공하는 복제 기능(Native replication)을 사용할 경우 문제가 되는 함수들이 있으니 잘 체크해야 겠다.

역시나 공식 문서에 잘 나와 있다.

아키텍처

process-diagram
그림 출처

Pgpool-II는 기본적으로 PostgreSQL과 사용자(Client)사이에 놓이는 미들웨어다.

그리고 Frontend/Backend Protocol을 이용하기 때문에 여러 PostgreSQL 버전을 이용할 수 있다고 하니 특정한 상황에서 이점들이 있겠다.

PCP(Pgpool Control Process)와 Watchdog도 볼 수 있는데 PCP를 통해 Pgpool-II의 상태나 node들을 붙이거나 떼어내는 액션도 할 수 있다.

Watchdog에 대한 내용은 아래 그림에서 더 잘 설명한다.

cluster
그림 출처

위 그림은 3대의 Pgpool-II서버(osspc16, osspc17, osspc18), 복제 구성이된 PostgreSQL서버 2대(마스터, 슬레이브) 그리고 4대의 AP 서버들로 그러져 있다.

Active상태인 osspc16이 Virtual IP를 할당 받아 AP 서버들로부터 Query를 받아주고 있는데 상황에 따라 Active가 슬레이브였던 osspc17이나 osspc18이 될 수 있다.

Active인 서버는 Virtual IP를 가져가기 때문에 AP 서버의 설정이 변경될 필요는 없다.

그리고 마스터(osspc19)가 이상이 생기면 슬레이브(osspc20)을 promote시키기도 한다.

쿼리 메모리 캐시를 사용 할 수 있는데 shmem이나 memcached를 선택 할 수 있다.

시나리오

어플리케이션과 PostgreSQL을 동시에 실행시키던 서버 1대에서 PostgreSQL 마스터로 사용할 서버가 1대 늘었다.

어플리케이션이 DBMS와 따로 있는것이 관리면에서 유리하기 때문에 어플리케이션이 실행될 서버를 하나 더 마련했다.

총 3대의 베어메탈 서버인데,

  1. Pgpool-II를 DBMS와 같이 2개를 올려 Watchdog을 설정하거나
  2. 어플리케이션서버와 Pgpool-II를 같이 두고 사용할지 선택해야 한다.

    두가지 다 어플리케이션 서버 1대이기 때문에 SFOP인데, 개인적으로 매한가지라고 생각된다.

    둘중 하나의 PostgreSQL이 죽게되더라도 어플리케이션은 정상 동작하게 된다.

    2의 경우 Pgpool-II가 하나이기 때문에 해당 프로세스가 죽게되도 SPOF이지만 잘 안죽겠지? ^^;

    이렇게 변명을 하고 보다 설정이 간단할 2를 만들어보자.

테스트

적당히 설정파일을 건드려주고 디버그 모드로 실행해보면 아래처럼 찍히는 로그를 볼 수 있다.

postgres@pg1:/pgpool$ pgpool -f /pgpool/etc/pgpool.conf -a /pgpool/etc/pool_hba.conf -d --dont-detach
2017-12-21 06:01:50: pid 22995: DEBUG:  initializing pool configuration
2017-12-21 06:01:50: pid 22995: DETAIL:  adding regex pattern for "black_function_list" pattern: ^currval$
2017-12-21 06:01:50: pid 22995: DEBUG:  initializing pool configuration
2017-12-21 06:01:50: pid 22995: DETAIL:  adding regex pattern for "black_function_list" pattern: ^lastval$
2017-12-21 06:01:50: pid 22995: DEBUG:  initializing pool configuration
2017-12-21 06:01:50: pid 22995: DETAIL:  adding regex pattern for "black_function_list" pattern: ^nextval$
2017-12-21 06:01:50: pid 22995: DEBUG:  initializing pool configuration
2017-12-21 06:01:50: pid 22995: DETAIL:  adding regex pattern for "black_function_list" pattern: ^setval$
2017-12-21 06:01:50: pid 22995: DEBUG:  setting value no index value for parameter "health_check_period" source = 1
2017-12-21 06:01:50: pid 22995: DEBUG:  parameter "health_check_period" is an array type variable and allows index-free value as well
2017-12-21 06:01:50: pid 22995: DEBUG:  setting value no index value for parameter "health_check_timeout" source = 1
2017-12-21 06:01:50: pid 22995: DEBUG:  parameter "health_check_timeout" is an array type variable and allows index-free value as well
2017-12-21 06:01:50: pid 22995: DEBUG:  setting value no index value for parameter "health_check_user" source = 1
2017-12-21 06:01:50: pid 22995: DEBUG:  parameter "health_check_user" is an array type variable and allows index-free value as well
2017-12-21 06:01:50: pid 22995: DEBUG:  setting value no index value for parameter "health_check_password" source = 1
2017-12-21 06:01:50: pid 22995: DEBUG:  parameter "health_check_password" is an array type variable and allows index-free value as well
2017-12-21 06:01:50: pid 22995: DEBUG:  setting value no index value for parameter "health_check_database" source = 1
2017-12-21 06:01:50: pid 22995: DEBUG:  parameter "health_check_database" is an array type variable and allows index-free value as well
2017-12-21 06:01:50: pid 22995: DEBUG:  setting value no index value for parameter "health_check_max_retries" source = 1
2017-12-21 06:01:50: pid 22995: DEBUG:  parameter "health_check_max_retries" is an array type variable and allows index-free value as well
2017-12-21 06:01:50: pid 22995: DEBUG:  setting value no index value for parameter "health_check_retry_delay" source = 1
2017-12-21 06:01:50: pid 22995: DEBUG:  parameter "health_check_retry_delay" is an array type variable and allows index-free value as well
2017-12-21 06:01:50: pid 22995: DEBUG:  setting value no index value for parameter "connect_timeout" source = 1
2017-12-21 06:01:50: pid 22995: DEBUG:  parameter "connect_timeout" is an array type variable and allows index-free value as well
2017-12-21 06:01:50: pid 22995: INFO:  unrecognized configuration parameter "enable_multiple_failover_requests_from_node"
2017-12-21 06:01:50: pid 22995: DEBUG:  initializing pool configuration
2017-12-21 06:01:50: pid 22995: DETAIL:  num_backends: 2 total_weight: 2.000000
2017-12-21 06:01:50: pid 22995: DEBUG:  initializing pool configuration
2017-12-21 06:01:50: pid 22995: DETAIL:  backend 0 weight: 1073741823.500000 flag: 0000
2017-12-21 06:01:50: pid 22995: DEBUG:  initializing pool configuration
2017-12-21 06:01:50: pid 22995: DETAIL:  backend 1 weight: 1073741823.500000 flag: 0000
2017-12-21 06:01:50: pid 22995: DEBUG:  pool_coninfo_size: num_init_children (32) * max_pool (4) * MAX_NUM_BACKENDS (128) * sizeof(ConnectionInfo) (136) = 2228224 bytes requested for shared memory
2017-12-21 06:01:50: pid 22995: DEBUG:  ProcessInfo: num_init_children (32) * sizeof(ProcessInfo) (32) = 1024 bytes requested for shared memory
2017-12-21 06:01:50: pid 22995: DEBUG:  Request info are: sizeof(POOL_REQUEST_INFO) 5264 bytes requested for shared memory
2017-12-21 06:01:50: pid 22995: DEBUG:  Recovery management area: sizeof(int) 4 bytes requested for shared memory
2017-12-21 06:01:50: pid 22995: LOG:  Setting up socket for 0.0.0.0:9999
2017-12-21 06:01:50: pid 22995: LOG:  Setting up socket for :::9999
2017-12-21 06:01:50: pid 22996: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 22997: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23004: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23005: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 22998: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 22999: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23000: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23001: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23013: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23002: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23014: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23015: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23011: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 22995: LOG:  find_primary_node_repeatedly: waiting for finding a primary node
2017-12-21 06:01:50: pid 22995: LOG:  find_primary_node: checking backend no 0
2017-12-21 06:01:50: pid 23007: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23012: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23016: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23006: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 22995: DEBUG:  SSL is requested but SSL support is not available
2017-12-21 06:01:50: pid 22995: DEBUG:  pool_flush_it: flush size: 41
2017-12-21 06:01:50: pid 23018: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23010: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23017: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 22995: DEBUG:  pool_read: read 317 bytes from backend 0
2017-12-21 06:01:50: pid 22995: DEBUG:  authenticate kind = 0
2017-12-21 06:01:50: pid 22995: DEBUG:  authenticate backend: key data received
2017-12-21 06:01:50: pid 22995: DEBUG:  authenticate backend: transaction state: I
2017-12-21 06:01:50: pid 22995: DEBUG:  do_query: extended:0 query:"SELECT pg_is_in_recovery()"
2017-12-21 06:01:50: pid 22995: DEBUG:  pool_write: to backend: 0 kind:Q
2017-12-21 06:01:50: pid 22995: DEBUG:  pool_flush_it: flush size: 32
2017-12-21 06:01:50: pid 23019: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23008: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23020: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23009: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23003: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23021: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23022: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23024: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23023: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23026: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23027: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23025: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 22995: DEBUG:  pool_read: read 75 bytes from backend 0
2017-12-21 06:01:50: pid 22995: DEBUG:  do_query: kind: 'T'
2017-12-21 06:01:50: pid 22995: DEBUG:  do_query: received ROW DESCRIPTION ('T')
2017-12-21 06:01:50: pid 22995: DEBUG:  do_query: row description: num_fileds: 1
2017-12-21 06:01:50: pid 22995: DEBUG:  do_query: kind: 'D'
2017-12-21 06:01:50: pid 22995: DEBUG:  do_query: received DATA ROW ('D')
2017-12-21 06:01:50: pid 22995: DEBUG:  do_query: kind: 'C'
2017-12-21 06:01:50: pid 22995: DEBUG:  do_query: received COMMAND COMPLETE ('C')
2017-12-21 06:01:50: pid 22995: DEBUG:  do_query: kind: 'Z'
2017-12-21 06:01:50: pid 22995: DEBUG:  do_query: received READY FOR QUERY ('Z')
2017-12-21 06:01:50: pid 22995: DEBUG:  pool_write: to backend: 0 kind:X
2017-12-21 06:01:50: pid 22995: DEBUG:  pool_flush_it: flush size: 5
2017-12-21 06:01:50: pid 22995: LOG:  find_primary_node: primary node id is 0
2017-12-21 06:01:50: pid 23029: DEBUG:  I am PCP child with pid:23029
2017-12-21 06:01:50: pid 23030: DEBUG:  I am 23030
2017-12-21 06:01:50: pid 23032: DEBUG:  I am health check process pid:23032 DB node id:1
2017-12-21 06:01:50: pid 23032: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23032: DEBUG:  health check: clearing alarm
2017-12-21 06:01:50: pid 22995: LOG:  pgpool-II successfully started. version 3.7RC1 (amefuriboshi)
2017-12-21 06:01:50: pid 23031: DEBUG:  I am health check process pid:23031 DB node id:0
2017-12-21 06:01:50: pid 23031: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23031: DEBUG:  health check: clearing alarm
2017-12-21 06:01:50: pid 23030: DEBUG:  initializing backend status
2017-12-21 06:01:50: pid 23031: DEBUG:  SSL is requested but SSL support is not available
2017-12-21 06:01:50: pid 23031: DEBUG:  pool_flush_it: flush size: 41
2017-12-21 06:01:50: pid 23031: DEBUG:  pool_read: read 317 bytes from backend 0
2017-12-21 06:01:50: pid 23031: DEBUG:  authenticate kind = 0
2017-12-21 06:01:50: pid 23031: DEBUG:  authenticate backend: key data received
2017-12-21 06:01:50: pid 23031: DEBUG:  authenticate backend: transaction state: I
2017-12-21 06:01:50: pid 23031: DEBUG:  health check: clearing alarm

그리고 노드들이 잘 붙었는지 pcp 커맨드로 확인하거나 Pgpool admin을 통해서 확인하면 된다.

Pgool admin은 php 로 만들어져 있으니 apache, mod-php 설치해서 파일만 복사해서 브라우저를 통해 확인하면된다.

패키지로 설치하게 되면 php와 PostgreSQL가 실행되는 사용자가 달라 접근 문제가 있을 수 있으니 적당히 맞추자.

postgres@pg1:/home/ubuntu$ pcp_node_count
Password:
2
postgres@pg1:/home/ubuntu$ pcp_node_info 0
Password:
10.0.3.30 5432 2 0.500000 up master
postgres@pg1:/home/ubuntu$
postgres@pg1:/home/ubuntu$ pcp_node_info 1
Password:
10.0.3.31 5432 2 0.500000 up slave

pgpool-admin-1

pgpool-admin-2

아래 처럼 Pgpool-II에 직접 접속해서 상태를 볼 수도 있다.

postgres@pg1:/home/ubuntu$ psql -h localhost -p 9999
psql (10.1)
Type "help" for help.

postgres=# show pool_version;
     pool_version
-----------------------
 3.7RC1 (amefuriboshi)
(1 row)

postgres=# show pool_nodes;
 node_id | hostname  | port | status | lb_weight |  role  | select_cnt | load_balance_node | replication_delay
---------+-----------+------+--------+-----------+--------+------------+-------------------+-------------------
 0       | 10.0.3.30 | 5432 | up     | 0.500000  | master | 2          | false             | 0
 1       | 10.0.3.31 | 5432 | up     | 0.500000  | slave  | 0          | true              | 0
(2 rows)

postgres=# \pset pager off
Pager usage is off.
postgres=# show pool_processes;
 pool_pid |     start_time      | database | username |     create_time     | pool_counter
----------+---------------------+----------+----------+---------------------+--------------
 22996    | 2017-12-21 06:01:50 |          |          |                     |
 22997    | 2017-12-21 06:01:50 |          |          |                     |
 22998    | 2017-12-21 06:01:50 |          |          |                     |
 22999    | 2017-12-21 06:01:50 |          |          |                     |
 23000    | 2017-12-21 06:01:50 |          |          |                     |
 ...
 23021    | 2017-12-21 06:01:50 |          |          |                     |
 23022    | 2017-12-21 06:01:50 |          |          |                     |
 23023    | 2017-12-21 06:01:50 | postgres | postgres | 2017-12-21 06:31:26 | 1
 23024    | 2017-12-21 06:01:50 |          |          |                     |
 23025    | 2017-12-21 06:01:50 |          |          |                     |
 23026    | 2017-12-21 06:01:50 |          |          |                     |
 23372    | 2017-12-21 06:19:17 |          |          |                     |
(32 rows)

postgres=# show pool_pools;
 pool_pid |     start_time      | pool_id | backend_id | database | username |     create_time     | majorversion | minorversion | pool_counter | pool_backendpid | pool_connected
----------+---------------------+---------+------------+----------+----------+---------------------+--------------+--------------+--------------+-----------------+----------------
 22996    | 2017-12-21 06:01:50 | 0       | 0          |          |          |                     | 0            | 0            | 0            | 0               | 0
 22996    | 2017-12-21 06:01:50 | 0       | 1          |          |          |                     | 0            | 0            | 0            | 0               | 0
 22996    | 2017-12-21 06:01:50 | 1       | 0          |          |          |                     | 0            | 0            | 0            | 0               | 0
 22996    | 2017-12-21 06:01:50 | 1       | 1          |          |          |                     | 0            | 0            | 0            | 0               | 0
 22996    | 2017-12-21 06:01:50 | 2       | 0          |          |          |                     | 0            | 0            | 0            | 0               | 0
 22996    | 2017-12-21 06:01:50 | 2       | 1          |          |          |                     | 0            | 0            | 0            | 0               | 0
 22996    | 2017-12-21 06:01:50 | 3       | 0          |          |          |                     | 0            | 0            | 0            | 0               | 0
 22996    | 2017-12-21 06:01:50 | 3       | 1          |          |          |                     | 0            | 0            | 0            | 0               | 0
 22997    | 2017-12-21 06:01:50 | 0       | 0          |          |          |                     | 0            | 0            | 0            | 0               | 0
 22997    | 2017-12-21 06:01:50 | 0       | 1          |          |          |                     | 0            | 0            | 0            | 0               | 0
 ...
 23021    | 2017-12-21 06:01:50 | 3       | 1          |          |          |                     | 0            | 0            | 0            | 0               | 0
 23022    | 2017-12-21 06:01:50 | 0       | 0          |          |          |                     | 0            | 0            | 0            | 0               | 0
 23022    | 2017-12-21 06:01:50 | 0       | 1          |          |          |                     | 0            | 0            | 0            | 0               | 0
 23022    | 2017-12-21 06:01:50 | 1       | 0          |          |          |                     | 0            | 0            | 0            | 0               | 0
 23022    | 2017-12-21 06:01:50 | 1       | 1          |          |          |                     | 0            | 0            | 0            | 0               | 0
 23022    | 2017-12-21 06:01:50 | 2       | 0          |          |          |                     | 0            | 0            | 0            | 0               | 0
 23022    | 2017-12-21 06:01:50 | 2       | 1          |          |          |                     | 0            | 0            | 0            | 0               | 0
 23022    | 2017-12-21 06:01:50 | 3       | 0          |          |          |                     | 0            | 0            | 0            | 0               | 0
 23022    | 2017-12-21 06:01:50 | 3       | 1          |          |          |                     | 0            | 0            | 0            | 0               | 0
 23023    | 2017-12-21 06:01:50 | 0       | 0          | postgres | postgres | 2017-12-21 06:31:26 | 3            | 0            | 1            | 23710           | 1
 23023    | 2017-12-21 06:01:50 | 0       | 1          | postgres | postgres | 2017-12-21 06:31:26 | 3            | 0            | 1            | 21177           | 1
 23023    | 2017-12-21 06:01:50 | 1       | 0          |          |          |                     | 0            | 0            | 0            | 0               | 0
 23023    | 2017-12-21 06:01:50 | 1       | 1          |          |          |                     | 0            | 0            | 0            | 0               | 0
 ...
 23372    | 2017-12-21 06:19:17 | 2       | 0          |          |          |                     | 0            | 0            | 0            | 0               | 0
 23372    | 2017-12-21 06:19:17 | 2       | 1          |          |          |                     | 0            | 0            | 0            | 0               | 0
 23372    | 2017-12-21 06:19:17 | 3       | 0          |          |          |                     | 0            | 0            | 0            | 0               | 0
 23372    | 2017-12-21 06:19:17 | 3       | 1          |          |          |                     | 0            | 0            | 0            | 0               | 0
(256 rows)

postgres=#

임의로 마스터를 정지시키면 아래 처럼 failover도 일어나게 된다.

2017-12-21 06:42:01: pid 23032: DEBUG:  health check: clearing alarm
2017-12-21 06:42:01: pid 23031: DEBUG:  health check: clearing alarm
2017-12-21 06:42:01: pid 23031: LOG:  failed to connect to PostgreSQL server on "10.0.3.30:5432", getsockopt() detected error "Connection refused"
2017-12-21 06:42:01: pid 23031: ERROR:  failed to make persistent db connection
2017-12-21 06:42:01: pid 23031: DETAIL:  connection to host:"10.0.3.30:5432" failed
2017-12-21 06:42:01: pid 23031: DEBUG:  health check: clearing alarm
2017-12-21 06:42:01: pid 23031: DEBUG:  health check: clearing alarm
2017-12-21 06:42:01: pid 23031: LOG:  health check failed on node 0 (timeout:0)
2017-12-21 06:42:01: pid 23031: LOG:  received degenerate backend request for node_id: 0 from pid [23031]
2017-12-21 06:42:01: pid 23031: DEBUG:  sending signal:10 to the parent process with PID:22995
2017-12-21 06:42:01: pid 22995: DEBUG:  Pgpool-II parent process received SIGUSR1
2017-12-21 06:42:01: pid 22995: LOG:  Pgpool-II parent process has received failover request
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler called
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kind: 1 flags: 1 node_count: 1 index:0
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  starting to select new master node
2017-12-21 06:42:01: pid 22995: LOG:  starting degeneration. shutdown host 10.0.3.30(5432)
2017-12-21 06:42:01: pid 23032: DEBUG:  SSL is requested but SSL support is not available
2017-12-21 06:42:01: pid 23032: DEBUG:  pool_flush_it: flush size: 41
2017-12-21 06:42:01: pid 22995: DEBUG:  failover/failback request details: STREAM: 0 reqkind: 1 detail: 1 node_id: 0
2017-12-21 06:42:01: pid 22995: LOG:  Restart all children
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:22996
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:22997
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:22998
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:22999
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23000
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23001
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23002
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23003
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23004
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23005
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23006
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23007
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23008
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23009
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23010
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23011
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23012
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23013
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23014
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23015
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23016
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23017
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23018
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23019
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23020
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23021
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23022
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23830
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23024
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23025
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23026
2017-12-21 06:42:01: pid 22995: DEBUG:  failover handler
2017-12-21 06:42:01: pid 22995: DETAIL:  kill process with PID:23372
2017-12-21 06:42:01: pid 22995: LOG:  execute command: /var/lib/postgresql/bin/failover.sh 0 0 1
2017-12-21 06:42:01: pid 23032: DEBUG:  pool_read: read 317 bytes from backend 1
2017-12-21 06:42:01: pid 23032: DEBUG:  authenticate kind = 0
2017-12-21 06:42:01: pid 23032: DEBUG:  authenticate backend: key data received
2017-12-21 06:42:01: pid 23032: DEBUG:  authenticate backend: transaction state: I
2017-12-21 06:42:01: pid 23032: DEBUG:  health check: clearing alarm
2017-12-21 06:42:01: pid 23032: DEBUG:  health check: clearing alarm
2017-12-21 06:42:01: pid 23032: DEBUG:  pool_write: to backend: 1 kind:X
2017-12-21 06:42:01: pid 23032: DEBUG:  pool_flush_it: flush size: 5
2017-12-21 06:42:01: pid 22995: LOG:  find_primary_node_repeatedly: waiting for finding a primary node
2017-12-21 06:42:01: pid 22995: LOG:  find_primary_node: checking backend no 0
2017-12-21 06:42:01: pid 22995: LOG:  find_primary_node: checking backend no 1
2017-12-21 06:42:01: pid 22995: DEBUG:  SSL is requested but SSL support is not available
2017-12-21 06:42:01: pid 22995: DEBUG:  pool_flush_it: flush size: 41
2017-12-21 06:42:01: pid 22995: DEBUG:  pool_read: read 317 bytes from backend 1
2017-12-21 06:42:01: pid 22995: DEBUG:  authenticate kind = 0
2017-12-21 06:42:01: pid 22995: DEBUG:  authenticate backend: key data received
2017-12-21 06:42:01: pid 22995: DEBUG:  authenticate backend: transaction state: I
2017-12-21 06:42:01: pid 22995: DEBUG:  do_query: extended:0 query:"SELECT pg_is_in_recovery()"
2017-12-21 06:42:01: pid 22995: DEBUG:  pool_write: to backend: 1 kind:Q
2017-12-21 06:42:01: pid 22995: DEBUG:  pool_flush_it: flush size: 32
2017-12-21 06:42:01: pid 22995: DEBUG:  pool_read: read 75 bytes from backend 1
2017-12-21 06:42:01: pid 22995: DEBUG:  do_query: kind: 'T'
2017-12-21 06:42:01: pid 22995: DEBUG:  do_query: received ROW DESCRIPTION ('T')
2017-12-21 06:42:01: pid 22995: DEBUG:  do_query: row description: num_fileds: 1
2017-12-21 06:42:01: pid 22995: DEBUG:  do_query: kind: 'D'
2017-12-21 06:42:01: pid 22995: DEBUG:  do_query: received DATA ROW ('D')
2017-12-21 06:42:01: pid 22995: DEBUG:  do_query: kind: 'C'
2017-12-21 06:42:01: pid 22995: DEBUG:  do_query: received COMMAND COMPLETE ('C')
2017-12-21 06:42:01: pid 22995: DEBUG:  do_query: kind: 'Z'
2017-12-21 06:42:01: pid 22995: DEBUG:  do_query: received READY FOR QUERY ('Z')
2017-12-21 06:42:01: pid 22995: DEBUG:  pool_write: to backend: 1 kind:X
2017-12-21 06:42:01: pid 22995: DEBUG:  pool_flush_it: flush size: 5
2017-12-21 06:42:01: pid 22995: DEBUG:  find_primary_node: 1 node is standby
2017-12-21 06:42:01: pid 22995: DEBUG:  find_primary_node: no primary node found
2017-12-21 06:42:01: pid 22995: DEBUG:  Pgpool-II parent process received SIGUSR1
2017-12-21 06:42:01: pid 22995: DEBUG:  reaper handler
2017-12-21 06:42:01: pid 22995: DEBUG:  reaper handler: exited due to switching
2017-12-21 06:42:02: pid 22995: LOG:  find_primary_node: checking backend no 0
2017-12-21 06:42:02: pid 22995: LOG:  find_primary_node: checking backend no 1
2017-12-21 06:42:02: pid 22995: DEBUG:  SSL is requested but SSL support is not available
2017-12-21 06:42:02: pid 22995: DEBUG:  pool_flush_it: flush size: 41
2017-12-21 06:42:02: pid 22995: DEBUG:  pool_read: read 317 bytes from backend 1
2017-12-21 06:42:02: pid 22995: DEBUG:  authenticate kind = 0
2017-12-21 06:42:02: pid 22995: DEBUG:  authenticate backend: key data received
2017-12-21 06:42:02: pid 22995: DEBUG:  authenticate backend: transaction state: I
2017-12-21 06:42:02: pid 22995: DEBUG:  do_query: extended:0 query:"SELECT pg_is_in_recovery()"
2017-12-21 06:42:02: pid 22995: DEBUG:  pool_write: to backend: 1 kind:Q
2017-12-21 06:42:02: pid 22995: DEBUG:  pool_flush_it: flush size: 32
2017-12-21 06:42:02: pid 22995: DEBUG:  pool_read: read 75 bytes from backend 1
2017-12-21 06:42:02: pid 22995: DEBUG:  do_query: kind: 'T'
2017-12-21 06:42:02: pid 22995: DEBUG:  do_query: received ROW DESCRIPTION ('T')
2017-12-21 06:42:02: pid 22995: DEBUG:  do_query: row description: num_fileds: 1
2017-12-21 06:42:02: pid 22995: DEBUG:  do_query: kind: 'D'
2017-12-21 06:42:02: pid 22995: DEBUG:  do_query: received DATA ROW ('D')
2017-12-21 06:42:02: pid 22995: DEBUG:  do_query: kind: 'C'
2017-12-21 06:42:02: pid 22995: DEBUG:  do_query: received COMMAND COMPLETE ('C')
2017-12-21 06:42:02: pid 22995: DEBUG:  do_query: kind: 'Z'
2017-12-21 06:42:02: pid 22995: DEBUG:  do_query: received READY FOR QUERY ('Z')
2017-12-21 06:42:02: pid 22995: DEBUG:  pool_write: to backend: 1 kind:X
2017-12-21 06:42:02: pid 22995: DEBUG:  pool_flush_it: flush size: 5
2017-12-21 06:42:02: pid 22995: DEBUG:  find_primary_node: 1 node is standby
2017-12-21 06:42:02: pid 22995: DEBUG:  find_primary_node: no primary node found
2017-12-21 06:42:03: pid 22995: LOG:  find_primary_node: checking backend no 0
2017-12-21 06:42:03: pid 22995: LOG:  find_primary_node: checking backend no 1
2017-12-21 06:42:03: pid 22995: DEBUG:  SSL is requested but SSL support is not available
2017-12-21 06:42:03: pid 22995: DEBUG:  pool_flush_it: flush size: 41
2017-12-21 06:42:03: pid 22995: DEBUG:  pool_read: read 317 bytes from backend 1
2017-12-21 06:42:03: pid 22995: DEBUG:  authenticate kind = 0
2017-12-21 06:42:03: pid 22995: DEBUG:  authenticate backend: key data received
2017-12-21 06:42:03: pid 22995: DEBUG:  authenticate backend: transaction state: I
2017-12-21 06:42:03: pid 22995: DEBUG:  do_query: extended:0 query:"SELECT pg_is_in_recovery()"
2017-12-21 06:42:03: pid 22995: DEBUG:  pool_write: to backend: 1 kind:Q
2017-12-21 06:42:03: pid 22995: DEBUG:  pool_flush_it: flush size: 32
2017-12-21 06:42:03: pid 22995: DEBUG:  pool_read: read 75 bytes from backend 1
2017-12-21 06:42:03: pid 22995: DEBUG:  do_query: kind: 'T'
2017-12-21 06:42:03: pid 22995: DEBUG:  do_query: received ROW DESCRIPTION ('T')
2017-12-21 06:42:03: pid 22995: DEBUG:  do_query: row description: num_fileds: 1
2017-12-21 06:42:03: pid 22995: DEBUG:  do_query: kind: 'D'
2017-12-21 06:42:03: pid 22995: DEBUG:  do_query: received DATA ROW ('D')
2017-12-21 06:42:03: pid 22995: DEBUG:  do_query: kind: 'C'
2017-12-21 06:42:03: pid 22995: DEBUG:  do_query: received COMMAND COMPLETE ('C')
2017-12-21 06:42:03: pid 22995: DEBUG:  do_query: kind: 'Z'
2017-12-21 06:42:03: pid 22995: DEBUG:  do_query: received READY FOR QUERY ('Z')
2017-12-21 06:42:03: pid 22995: DEBUG:  pool_write: to backend: 1 kind:X
2017-12-21 06:42:03: pid 22995: DEBUG:  pool_flush_it: flush size: 5
2017-12-21 06:42:03: pid 22995: DEBUG:  find_primary_node: 1 node is standby
2017-12-21 06:42:03: pid 22995: DEBUG:  find_primary_node: no primary node found
2017-12-21 06:42:04: pid 22995: LOG:  find_primary_node: checking backend no 0
2017-12-21 06:42:04: pid 22995: LOG:  find_primary_node: checking backend no 1
2017-12-21 06:42:04: pid 22995: DEBUG:  SSL is requested but SSL support is not available
2017-12-21 06:42:04: pid 22995: DEBUG:  pool_flush_it: flush size: 41
2017-12-21 06:42:04: pid 22995: DEBUG:  pool_read: read 317 bytes from backend 1
2017-12-21 06:42:04: pid 22995: DEBUG:  authenticate kind = 0
2017-12-21 06:42:04: pid 22995: DEBUG:  authenticate backend: key data received
2017-12-21 06:42:04: pid 22995: DEBUG:  authenticate backend: transaction state: I
2017-12-21 06:42:04: pid 22995: DEBUG:  do_query: extended:0 query:"SELECT pg_is_in_recovery()"
2017-12-21 06:42:04: pid 22995: DEBUG:  pool_write: to backend: 1 kind:Q
2017-12-21 06:42:04: pid 22995: DEBUG:  pool_flush_it: flush size: 32
2017-12-21 06:42:04: pid 22995: DEBUG:  pool_read: read 75 bytes from backend 1
2017-12-21 06:42:04: pid 22995: DEBUG:  do_query: kind: 'T'
2017-12-21 06:42:04: pid 22995: DEBUG:  do_query: received ROW DESCRIPTION ('T')
2017-12-21 06:42:04: pid 22995: DEBUG:  do_query: row description: num_fileds: 1
2017-12-21 06:42:04: pid 22995: DEBUG:  do_query: kind: 'D'
2017-12-21 06:42:04: pid 22995: DEBUG:  do_query: received DATA ROW ('D')
2017-12-21 06:42:04: pid 22995: DEBUG:  do_query: kind: 'C'
2017-12-21 06:42:04: pid 22995: DEBUG:  do_query: received COMMAND COMPLETE ('C')
2017-12-21 06:42:04: pid 22995: DEBUG:  do_query: kind: 'Z'
2017-12-21 06:42:04: pid 22995: DEBUG:  do_query: received READY FOR QUERY ('Z')
2017-12-21 06:42:04: pid 22995: DEBUG:  pool_write: to backend: 1 kind:X
2017-12-21 06:42:04: pid 22995: DEBUG:  pool_flush_it: flush size: 5
2017-12-21 06:42:04: pid 22995: DEBUG:  find_primary_node: 1 node is standby
2017-12-21 06:42:04: pid 22995: DEBUG:  find_primary_node: no primary node found
2017-12-21 06:42:05: pid 22995: LOG:  find_primary_node: checking backend no 0
2017-12-21 06:42:05: pid 22995: LOG:  find_primary_node: checking backend no 1
2017-12-21 06:42:05: pid 22995: DEBUG:  SSL is requested but SSL support is not available
2017-12-21 06:42:05: pid 22995: DEBUG:  pool_flush_it: flush size: 41
2017-12-21 06:42:05: pid 22995: DEBUG:  pool_read: read 317 bytes from backend 1
2017-12-21 06:42:05: pid 22995: DEBUG:  authenticate kind = 0
2017-12-21 06:42:05: pid 22995: DEBUG:  authenticate backend: key data received
2017-12-21 06:42:05: pid 22995: DEBUG:  authenticate backend: transaction state: I
2017-12-21 06:42:05: pid 22995: DEBUG:  do_query: extended:0 query:"SELECT pg_is_in_recovery()"
2017-12-21 06:42:05: pid 22995: DEBUG:  pool_write: to backend: 1 kind:Q
2017-12-21 06:42:05: pid 22995: DEBUG:  pool_flush_it: flush size: 32
2017-12-21 06:42:05: pid 22995: DEBUG:  pool_read: read 75 bytes from backend 1
p2017-12-21 06:42:05: pid 22995: DEBUG:  do_query: kind: 'T'
2017-12-21 06:42:05: pid 22995: DEBUG:  do_query: received ROW DESCRIPTION ('T')
2017-12-21 06:42:05: pid 22995: DEBUG:  do_query: row description: num_fileds: 1
2017-12-21 06:42:05: pid 22995: DEBUG:  do_query: kind: 'D'
2017-12-21 06:42:05: pid 22995: DEBUG:  do_query: received DATA ROW ('D')
2017-12-21 06:42:05: pid 22995: DEBUG:  do_query: kind: 'C'
2017-12-21 06:42:05: pid 22995: DEBUG:  do_query: received COMMAND COMPLETE ('C')
2017-12-21 06:42:05: pid 22995: DEBUG:  do_query: kind: 'Z'
2017-12-21 06:42:05: pid 22995: DEBUG:  do_query: received READY FOR QUERY ('Z')
2017-12-21 06:42:05: pid 22995: DEBUG:  pool_write: to backend: 1 kind:X
2017-12-21 06:42:05: pid 22995: DEBUG:  pool_flush_it: flush size: 5
2017-12-21 06:42:05: pid 22995: LOG:  find_primary_node: primary node id is 1
2017-12-21 06:42:05: pid 22995: LOG:  failover: set new primary node: 1
2017-12-21 06:42:05: pid 22995: LOG:  failover: set new master node: 1
2017-12-21 06:42:05: pid 24041: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24042: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24043: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24044: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24045: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24046: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24047: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24048: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24049: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24056: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24057: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24055: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24058: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24059: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24060: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24061: DEBUG:  initializing backend status
2017-12-21 06:42:05: pid 24054: DEBUG:  initializing backend status
failover done. shutdown host 10.0.3.30(5432)2017-12-21 06:42:05: pid 23030: LOG:  worker process received restart request
2017-12-21 06:42:05: pid 22995: LOG:  failover done. shutdown host 10.0.3.30(5432)
2017-12-21 06:42:05: pid 24062: DEBUG:  initializing backend status

아래 recovery 버튼은 ONLINE RECOVERY 섹션의 설정을 채워주어야 활성화 된다.

pgpool-admin-3

그리고 중지된 마스터를 복구하면 아래 처럼 다시 사용하게 하는 버튼도 생긴다.
pgpool-admin-4

물론 pcp 커맨드로 failover를 테스트 할 수 있다. 자세한 내용은 메뉴얼을 참고하자.

설정 파일

아래 처럼 보통 컴파일을 하면 $prefix/etc에 기본 설정 템플릿이 있다.

그러니 맘에 드는 설정부터 출발해서 고쳐가는게 정신건강에 좋겠다.

postgres@pg1:/usr/local/etc$ ls -al
total 196
drwxr-xr-x  2 root root  4096 Dec 13 14:02 .
drwxr-xr-x 10 root root  4096 Dec 12 15:44 ..
-rw-r--r--  1 root root   858 Dec 13 14:02 pcp.conf.sample
-rw-r--r--  1 root root 35832 Dec 13 14:02 pgpool.conf.sample
-rw-r--r--  1 root root 35305 Dec 13 14:02 pgpool.conf.sample-logical
-rw-r--r--  1 root root 35460 Dec 13 14:02 pgpool.conf.sample-master-slave
-rw-r--r--  1 root root 35420 Dec 13 14:02 pgpool.conf.sample-replication
-rw-r--r--  1 root root 35482 Dec 13 14:02 pgpool.conf.sample-stream
-rw-r--r--  1 root root  3260 Dec 13 14:02 pool_hba.conf.sample
postgres@pg1:/usr/local/etc$ pwd
/usr/local/etc
postgres@pg1:/usr/local/etc$

나는 PostgreSQL의 logical replication을 이용할 생각이기 때문에 pgpool.conf.sample-logical파일을 복사해서 사용하거나

아래 처럼 pgpool_setup을 도움받아 출발해도 좋다. 중간에 연결안된다는 오류는 무시해도 괜찮다.

postgres@pg1:/pgpool2$ PGBIN=/usr/bin pgpool_setup -m l -d
Satrting set up in logical replication mode
creating startall and shutdownall
creating failover script
creating database cluster /pgpool2/data0...done.
update postgreql.conf
creating pgpool_remote_start
creating basebackup.sh
creating database cluster /pgpool2/data1...done.
update postgreql.conf
creating pgpool_remote_start
creating basebackup.sh
temporarily start data0 cluster to create extensions
temporarily start data1 cluster to create extensions
start all
waiting for pgpool-II coming up...done.
psql: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.11000"?
shutdown all

pgpool-II setting for logical replication mode is done.
To start the whole system, use /pgpool2/startall.
To shutdown the whole system, use /pgpool2/shutdownall.
pcp command user name is "postgres", password is "postgres".
Each PostgreSQL, pgpool-II and pcp port is as follows:
#1 port is 11002
#2 port is 11003
pgpool port is 11000
pcp port is 11001
The info above is in README.port.
postgres@pg1:/pgpool2$ tree
.
├── archivedir
├── bashrc.ports
├── etc
│   ├── failover.sh
│   ├── pcp.conf
│   ├── pgpool.conf
│   └── pool_passwd
├── log
│   ├── pgpool
│   │   └── oiddir
│   ├── pgpool.log
│   └── pgpool_status
├── pcppass
├── pgpool_reload
├── pgpool_setup.log
├── README.port
├── run
├── shutdownall
└── startall

6 directories, 13 files
postgres@pg1:/pgpool2$

pgpool.conf

delay_threshold = 10000000

delay_threshold의 10MB만큼 슬레이브 서버가 복제를 쫓아오지 못하면 슬레이브 서버에 SELECT 쿼리를 실행하지 않는다.

슬레이브 서버가 마스터만큼 하드웨어가 좋지 않거나 다른 사정이 있다면 차이 날텐데 꼭 올바른 데이타가 필요하다면 꼭 설정 해야겠다.

비슷한 성능이 아니더라도 쓰기가 작업이 많다면 염려해둬야 하는 옵션

log_standby_delay = 'if_over_threshold'

‘none’이 기본값, ‘if_over_threshold’일 경우 복제 delay_threshold에 도달하면 로깅해준다.

insert_lock = off

복제는 PostgreSQL 본연의 그것을 쓸것이기에 insert_lock을 할 필요는 없겠다.

load_balance_mode = on

host_standby모드를 이용해 서버에게 모두 일을 시킬것이기 때문에 로드밸런스도 켜준다.

black_function_list = 'currval,lastval,nextval,setval'

원래는 ‘nextval,setval,nextval,setval’ 였다.

여기 걸린 함수를 사용하면 마스터y 서버로만 쿼리를 보낸다.

‘update_.’, ‘insert_.’, ‘delete_.*‘를 사용할 수도 있다.

master_slave_mode = on
master_slave_sub_mode = 'logical'

PostgreSQL의 logical 복제를 이용하려면 master_slave_mode를 ‘on’, master_slave_sub_mode를 ‘logical’로 설정해야 한다.

health_check_user = 'postgres'
health_check_password = ''
health_check_database = 'postgres'

PgPool-II의 노드들이 살아있는지 체크할때 사용할 사용자명, 비밀번호, 데이타베이스 이름이다.

PgPool-II 데몬이 실행되고 있는 서버에서 해당 노드들이 위 정보로 접속이 되어야 한다.

failover_command = '/var/lib/postgresql/bin/failover.sh %d %M %m'

‘health_check’이 실패하게되면 failover가 일어나는데 그때 실행될 프로그램이다.

보통 스크립트를 만들어 사용하게된다.

‘%d’는 망가진 노드, ‘%M’은 망가지기 전의 마스터 노드, ‘%m’은 망가져서 새로 마스터로 만들 노드다.

설정 파일에서 더 다양한 정보를 받아서 인자로 넘길 수 있다.

아래 스크립트 처럼 마음대로 만들어 사용하면 되겠다.

스크립트에 대해 약간 설명을 하면 인자로 망가진노드, 마스터였던 노드, 마스터로 만들 노드를 받는다.

그리고 망가진 노드가 0이 아닐때경우(보통 0이 마스터다, 0부터 0, 1, 2, …) 마스터가 아닌 노드에 ssh를 통해 접속해 /tmp/postgresql.tirgger.5432 파일을 만들어 recovery모드를 빠져나온다.

그리고 recovery.done파일이 생길때까지 기다린다.

만약 복제되고 있던 slave가 더 있다면 해당 슬레이브가 새 마스터를 바라 보도록 해줘야 한다.

#!/usr/bin/env sh

FALLING_NODE=$1
OLD_MASTER=$2
NEW_MASTER=$3

if [ test $FALLING_NODE -eq 0 ]; then
    ssh -T pg2 touch /tmp/postgresql.trigger.5432
    ssh -T pg2 "while [ ! -f /var/lib/postgresql/10/main/recovery.done ]; do sleep 1; done"
fi
exit 0;

pcp.conf

postgres:e8a48653851e28c69d0506508fb27fc5

아래는 pcp.conf.sample파일

커맨드라인이나 Pgpool Admin이 PCP 커맨드를 실행할때 필요한 인증을 설정한다.

pg_md5를 이용해 문자열을 만들어 넣어주면 되겠다.

설정 파일에 포함된 e8a48653851e28c69d0506508fb27fc5은 실제론 postgres다.

아래 처럼 체크할 수 있다.

$ pg_md5 -p
password:
e8a48653851e28c69d0506508fb27fc5

pool_hba.conf

# TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD
# "local" is for Unix domain socket connections only
local   all         all                               trust
host    all         all         127.0.0.1/32          trust
host    all         all         ::1/128               trust
host    all         all         10.0.3.30/32          trust
host    all         all         10.0.3.31/32          trust

아래는 pool_hba.conf.sample은 PostgreSQL의 pg_hba.conf와 거의(?)같다.

어플리케이션이 PgPool-II에 연결할때 인증을 설정해준다.

마무리

처음엔 용어때문에 헷갈렸던 부분도 있지만 어쨌든 원하는 시나리오대로의 동작은 확인했다.

하지만 설정엔 백업 정책과도 관련있고 설정할 파일도 종류가 많다.

꼼꼼하게 테스트도 하려면 시간이 많이 걸리는 작업이다.

아~ 그래서 DBA DBA하는건가? 란 생각이 들었다.

개인적으론 확실하게 MySQL을 선호하게되는 계기가 되었다.

반응형
Comments