Six uses of postgresql highly available repmgr 1 Primary + 1 Standby auto failover

Postgredaxiang

326人浏览 · 2022-08-12 18:33:20

Postgredaxiang · 2022-08-12 18:33:20 发布

os: ubunbu 16.04
postgresql: 9.6.8
repmgr: 4.1.1

192.168.56.101 node1
192.168.56.102 node2

Under the premise that the production environment ensures the stability of the network, the database can auto failover is the best state. For synchronous replication, it is recommended to configure two Slavs. For asynchronous replication, it is also recommended to configure two Slavs.

To implement auto failover, you need to enable repmgrd. To enable repmgrd, you need to set the shared [preload] libraries = 'repmgr' of postgresql.conf

$ vi /etc/postgresql/9.6/main/postgresql.conf
shared_preload_libraries = 'repmgr'

/Contents of etc/repmgr.conf

Viewed on node1 node, node2 node is basically the same.

# more /etc/repmgr.conf

##############################################
# node information
node_id=1
node_name=node1
conninfo='host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/9.6/main'
use_replication_slots=true
pg_bindir='/usr/lib/postgresql/9.6/bin'
pg_ctl_options='-l /var/log/postgres/startup.log'
rsync_options=--exclude=postgresql.local.conf --archive
#use_primary_conninfo_password=true
location='location01'
##############################################
# service_start_command
service_start_command   = 'sudo pg_ctlcluster 9.6 main start'
service_stop_command    = 'sudo pg_ctlcluster 9.6 main stop'
service_restart_command = 'sudo pg_ctlcluster 9.6 main restart'
service_reload_command  = 'sudo pg_ctlcluster 9.6 main reload' 
service_promote_command = 'sudo pg_ctlcluster 9.6 main promote'
##############################################
# log info
log_level=INFO
log_status_interval=10
log_file=/var/log/postgresql/repmgrd.log
##############################################
# repmgrd options
monitoring_history=yes
monitor_interval_secs=5
reconnect_attempts=10
reconnect_interval=5
##############################################
# automatic failover
failover=automatic
promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'

Start repmgrd

Each node needs to be started
Manual start

# repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid

System boot

# vi /etc/default/repmgrd 

# default settings for repmgrd. This file is source by /bin/sh from
# /etc/init.d/repmgrd

# disable repmgrd by default so it won't get started upon installation
# valid values: yes/no
REPMGRD_ENABLED=yes

# configuration file (required)
REPMGRD_CONF="/etc/repmgr.conf"

# additional options
#REPMGRD_OPTS=""

# user to run repmgrd as
REPMGRD_USER=postgres

# repmgrd binary
REPMGRD_BIN=/usr/bin/repmgrd

# pid file
REPMGRD_PIDFILE=/var/run/repmgrd.pid

Repmgrd? Conf points to the configuration file
It is necessary to configure the system to enable.

# systemctl status repmgrd.service
# systemctl start repmgrd.service
# systemctl enable repmgrd.service

repmgrd process on node1

# ps -ef|grep -i repmgrd |grep -v grep
postgres  4091     1  0 14:30 ?        00:00:00 /usr/lib/postgresql/9.6/bin/repmgrd --config-file /etc/repmgr.conf

# cat /var/log/postgresql/repmgrd.log 
[2018-09-25 14:30:28] [NOTICE] repmgrd (repmgrd 4.1.1) starting up
[2018-09-25 14:30:28] [INFO] connecting to database "host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2"
[2018-09-25 14:30:28] [NOTICE] starting monitoring of node "node1" (ID: 1)
[2018-09-25 14:30:28] [NOTICE] monitoring cluster primary "node1" (node ID: 1)
[2018-09-25 14:30:38] [INFO] monitoring primary node "node1" (node ID: 1) in normal state
[2018-09-25 14:30:48] [INFO] monitoring primary node "node1" (node ID: 1) in normal state

repmgrd process on node2

# ps -ef|grep -i repmgrd |grep -v grep
postgres  4349     1  0 14:18 ?        00:00:00 /usr/lib/postgresql/9.6/bin/repmgrd --config-file /etc/repmgr.conf

# cat /var/log/postgresql/repmgrd.log 
[2018-09-25 14:33:44] [NOTICE] repmgrd (repmgrd 4.1.1) starting up
[2018-09-25 14:33:44] [INFO] connecting to database "host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2"
[2018-09-25 14:33:44] [NOTICE] starting monitoring of node "node2" (ID: 2)
[2018-09-25 14:33:44] [INFO] monitoring connection to upstream node "node1" (node ID: 1)
[2018-09-25 14:33:55] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2018-09-25 14:33:55] [DETAIL] last monitoring statistics update was 5 seconds ago
[2018-09-25 14:34:05] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2018-09-25 14:34:05] [DETAIL] last monitoring statistics update was 5 seconds ago

Simulation exception of manually closing the main database

Operation on node1 node

$ repmgr -f /etc/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Connection string                                            
----+-------+---------+-----------+----------+----------+-----------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | standby |   running | node1    | default  | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2
 
$ pg_ctl -D /var/lib/postgresql/9.6/main -m fast stop
//perhaps
$ sudo pg_ctlcluster 9.6 main stop

$ tail -f /var/log/postgresql/repmgrd.log
[2018-09-25 15:08:23] [NOTICE] repmgrd (repmgrd 4.1.1) starting up
[2018-09-25 15:08:23] [INFO] connecting to database "host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2"
[2018-09-25 15:08:23] [NOTICE] starting monitoring of node "node1" (ID: 1)
[2018-09-25 15:08:23] [NOTICE] monitoring cluster primary "node1" (node ID: 1)
[2018-09-25 15:08:33] [INFO] monitoring primary node "node1" (node ID: 1) in normal state
[2018-09-25 15:08:43] [INFO] monitoring primary node "node1" (node ID: 1) in normal state
[2018-09-25 15:08:53] [INFO] monitoring primary node "node1" (node ID: 1) in normal state
[2018-09-25 15:09:03] [INFO] monitoring primary node "node1" (node ID: 1) in normal state
[2018-09-25 15:09:13] [INFO] monitoring primary node "node1" (node ID: 1) in normal state
[2018-09-25 15:09:23] [INFO] monitoring primary node "node1" (node ID: 1) in normal state
[2018-09-25 15:09:33] [INFO] monitoring primary node "node1" (node ID: 1) in normal state
[2018-09-25 15:09:43] [INFO] monitoring primary node "node1" (node ID: 1) in normal state
[2018-09-25 15:09:53] [WARNING] unable to connect to local node
[2018-09-25 15:09:53] [INFO] checking state of node 1, 1 of 10 attempts
[2018-09-25 15:09:53] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:09:58] [INFO] checking state of node 1, 2 of 10 attempts
[2018-09-25 15:09:58] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:03] [INFO] checking state of node 1, 3 of 10 attempts
[2018-09-25 15:10:03] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:08] [INFO] checking state of node 1, 4 of 10 attempts
[2018-09-25 15:10:08] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:13] [INFO] checking state of node 1, 5 of 10 attempts
[2018-09-25 15:10:13] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:18] [INFO] checking state of node 1, 6 of 10 attempts
[2018-09-25 15:10:18] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:23] [INFO] checking state of node 1, 7 of 10 attempts
[2018-09-25 15:10:23] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:28] [INFO] checking state of node 1, 8 of 10 attempts
[2018-09-25 15:10:28] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:33] [INFO] checking state of node 1, 9 of 10 attempts
[2018-09-25 15:10:33] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:38] [INFO] checking state of node 1, 10 of 10 attempts
[2018-09-25 15:10:38] [WARNING] unable to reconnect to node 1 after 10 attempts
[2018-09-25 15:10:38] [NOTICE] unable to connect to local node, falling back to degraded monitoring
[2018-09-25 15:10:38] [INFO] monitoring primary node "node1" (node ID: 1) in degraded state
[2018-09-25 15:10:38] [DETAIL] waiting for the node to become available
[2018-09-25 15:10:48] [INFO] monitoring primary node "node1" (node ID: 1) in degraded state
[2018-09-25 15:10:48] [DETAIL] waiting for the node to become available

View on node2

$ repmgr -f /etc/repmgr.conf cluster show
 ID | Name  | Role    | Status        | Upstream | Location | Connection string                                            
----+-------+---------+---------------+----------+----------+-----------------------------------------------------------------
 1  | node1 | primary | ? unreachable |          | default  | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | standby |   running     | node1    | default  | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - when attempting to connect to node "node1" (ID: 1), following error encountered :
"could not connect to server: Connection refused
	Is the server running on host "192.168.56.101" and accepting
	TCP/IP connections on port 5432?"
  - node "node1" (ID: 1) is registered as an active primary but is unreachable

$ tail -f /var/log/postgresql/repmgrd.log 

[2018-09-25 15:07:58] [NOTICE] repmgrd (repmgrd 4.1.1) starting up
[2018-09-25 15:07:58] [INFO] connecting to database "host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2"
[2018-09-25 15:07:58] [NOTICE] starting monitoring of node "node2" (ID: 2)
[2018-09-25 15:07:58] [INFO] monitoring connection to upstream node "node1" (node ID: 1)
[2018-09-25 15:08:08] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2018-09-25 15:08:08] [DETAIL] last monitoring statistics update was 5 seconds ago
[2018-09-25 15:08:18] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2018-09-25 15:08:18] [DETAIL] last monitoring statistics update was 5 seconds ago
[2018-09-25 15:08:28] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2018-09-25 15:08:28] [DETAIL] last monitoring statistics update was 5 seconds ago
[2018-09-25 15:08:38] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2018-09-25 15:08:38] [DETAIL] last monitoring statistics update was 5 seconds ago
[2018-09-25 15:08:48] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2018-09-25 15:08:48] [DETAIL] last monitoring statistics update was 5 seconds ago
[2018-09-25 15:08:58] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2018-09-25 15:08:58] [DETAIL] last monitoring statistics update was 5 seconds ago
[2018-09-25 15:09:08] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2018-09-25 15:09:08] [DETAIL] last monitoring statistics update was 5 seconds ago
[2018-09-25 15:09:18] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2018-09-25 15:09:18] [DETAIL] last monitoring statistics update was 5 seconds ago
[2018-09-25 15:09:28] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2018-09-25 15:09:28] [DETAIL] last monitoring statistics update was 5 seconds ago
[2018-09-25 15:09:38] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2018-09-25 15:09:38] [DETAIL] last monitoring statistics update was 5 seconds ago
[2018-09-25 15:09:48] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2018-09-25 15:09:48] [DETAIL] last monitoring statistics update was 5 seconds ago

[2018-09-25 15:09:53] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
[2018-09-25 15:09:53] [INFO] checking state of node 1, 1 of 10 attempts
[2018-09-25 15:09:53] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:09:58] [INFO] checking state of node 1, 2 of 10 attempts
[2018-09-25 15:09:58] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:03] [INFO] checking state of node 1, 3 of 10 attempts
[2018-09-25 15:10:03] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:08] [INFO] checking state of node 1, 4 of 10 attempts
[2018-09-25 15:10:08] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:13] [INFO] checking state of node 1, 5 of 10 attempts
[2018-09-25 15:10:13] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:18] [INFO] checking state of node 1, 6 of 10 attempts
[2018-09-25 15:10:18] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:23] [INFO] checking state of node 1, 7 of 10 attempts
[2018-09-25 15:10:23] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:28] [INFO] checking state of node 1, 8 of 10 attempts
[2018-09-25 15:10:28] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-25 15:10:33] [INFO] checking state of node 1, 9 of 10 attempts
[2018-09-25 15:10:33] [INFO] sleeping 5 seconds until next reconnection attempt

[2018-09-25 15:10:38] [INFO] checking state of node 1, 10 of 10 attempts
[2018-09-25 15:10:38] [WARNING] unable to reconnect to node 1 after 10 attempts
[2018-09-25 15:10:38] [NOTICE] this node is the only available candidate and will now promote itself
[2018-09-25 15:10:38] [INFO] promote_command is:
  "/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file"
[2018-09-25 15:10:38] [NOTICE] redirecting logging output to "/var/log/postgresql/repmgrd.log"

[2018-09-25 15:10:38] [NOTICE] promoting standby to primary
[2018-09-25 15:10:38] [DETAIL] promoting server "node2" (ID: 2) using "sudo pg_ctlcluster 9.6 main promote"
[2018-09-25 15:10:38] [DETAIL] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2018-09-25 15:10:39] [NOTICE] STANDBY PROMOTE successful
[2018-09-25 15:10:39] [DETAIL] server "node2" (ID: 2) was successfully promoted to primary
[2018-09-25 15:10:39] [INFO] switching to primary monitoring mode
[2018-09-25 15:10:39] [NOTICE] monitoring cluster primary "node2" (node ID: 2)
[2018-09-25 15:10:49] [INFO] monitoring primary node "node2" (node ID: 2) in normal state
[2018-09-25 15:10:59] [INFO] monitoring primary node "node2" (node ID: 2) in normal state
[2018-09-25 15:11:09] [INFO] monitoring primary node "node2" (node ID: 2) in normal state
[2018-09-25 15:11:19] [INFO] monitoring primary node "node2" (node ID: 2) in normal state

After 10*5s, auto failover: promoting standby to primary occurs, as follows

[2018-09-25 15:10:38] [INFO] checking state of node 1, 10 of 10 attempts
[2018-09-25 15:10:38] [WARNING] unable to reconnect to node 1 after 10 attempts
[2018-09-25 15:10:38] [NOTICE] this node is the only available candidate and will now promote itself
[2018-09-25 15:10:38] [INFO] promote_command is:
  "/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file"
[2018-09-25 15:10:38] [NOTICE] redirecting logging output to "/var/log/postgresql/repmgrd.log"

[2018-09-25 15:10:38] [NOTICE] promoting standby to primary
[2018-09-25 15:10:38] [DETAIL] promoting server "node2" (ID: 2) using "sudo pg_ctlcluster 9.6 main promote"
[2018-09-25 15:10:38] [DETAIL] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2018-09-25 15:10:39] [NOTICE] STANDBY PROMOTE successful
[2018-09-25 15:10:39] [DETAIL] server "node2" (ID: 2) was successfully promoted to primary
[2018-09-25 15:10:39] [INFO] switching to primary monitoring mode
[2018-09-25 15:10:39] [NOTICE] monitoring cluster primary "node2" (node ID: 2)
[2018-09-25 15:10:49] [INFO] monitoring primary node "node2" (node ID: 2) in normal state

View cluster show on node2

$ repmgr -f /etc/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Connection string                                            
----+-------+---------+-----------+----------+----------+-----------------------------------------------------------------
 1  | node1 | primary | - failed  |          | default  | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - when attempting to connect to node "node1" (ID: 1), following error encountered :
"could not connect to server: Connection refused
	Is the server running on host "192.168.56.101" and accepting
	TCP/IP connections on port 5432?"

Virtual machine power failure simulation exception

After the previous test, node2 is now the master, and node1 becomes slave after node rejoin ing. Please refer to the previous blog for details.
View the status of node2 node before power failure

$ repmgr -f /etc/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Connection string                                            
----+-------+---------+-----------+----------+----------+-----------------------------------------------------------------
 1  | node1 | standby |   running | node2    | default  | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2

node2 starts to power down

View on node1

$ repmgr -f /etc/repmgr.conf cluster show
 ID | Name  | Role    | Status        | Upstream | Location | Connection string                                            
----+-------+---------+---------------+----------+----------+-----------------------------------------------------------------
 1  | node1 | standby |   running     | node2    | default  | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | ? unreachable |          | default  | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - when attempting to connect to node "node2" (ID: 2), following error encountered :
"could not connect to server: No route to host
	Is the server running on host "192.168.56.102" and accepting
	TCP/IP connections on port 5432?"
  - node "node2" (ID: 2) is registered as an active primary but is unreachable

At this time, the node1 node will also be promoted to master, and the log output promoted to master is basically similar, so it will not be pasted again.

Reference resources:
https://www.2ndquadrant.com/en/resources/repmgr/
https://github.com/2ndQuadrant/repmgr
https://repmgr.org/docs/4.1/using-repmgrd.html

PostgreSQL

PostgreSQL社区为您提供最前沿的新闻资讯和知识内容

更多推荐

PostgreSQL 计数查询效率,物化视图 [重复]

问题:PostgreSQL 计数查询效率,物化视图 [重复] 可能重复: PostgreSQL 计数查询优化使用 PostgreSQL 9.2,我们试图弄清楚是否有一种方法可以跟踪查询的结果数量,并以有效的方式返回该数字。这个查询应该每秒执行几次(可能几十到几百甚至几千次)。我们现在的查询看起来像这样,但我们想知道这是否效率低下: -- Get # of rows that do not hav

PostgreSQL

多对多中的唯一性

问题:多对多中的唯一性我无法弄清楚谷歌的哪些术语,所以帮助标记这个问题或只是以相关问题的方式向我指出会有所帮助。我相信我有一个典型的多对多关系: CREATE TABLE groups ( id integer PRIMARY KEY); CREATE TABLE elements ( id integer PRIMARY KEY); CREATE TABLE groups_elements

PostgreSQL

Django 与 postgresql - manage.py syncdb 返回错误

问题:Django 与 postgresql - manage.py syncdb 返回错误我从 Django 开始。我设置了一些使用 SQLite 工作的站点,但是在将 DB 引擎更改为 postgresql manage.py syncdb 后返回错误。我已经用谷歌搜索了 2 天,但对我仍然没有任何作用。Postgres 用户 'joe' 具有超级用户权限和本地 'joe ' 数据库存在。