Centos 8 搭建pacemaker+corosync实现高可用集群

文章共2,298字 · 阅读需要大约8分钟

一键AI生成摘要，助你高效阅读

问答

~槑~

1226人浏览 · 2023-11-26 13:02:26

~槑~ · 2023-11-26 13:02:26 发布

1、Pacemaker介绍：

　　 Pacemaker是 Linux环境中使用最为广泛的开源集群资源管理器， Pacemaker利用集群基础架构(Corosync或者 Heartbeat)提供的消息和集群成员管理功能，实现节点和资源级别的故障检测和资源恢复，从而最大程度保证集群服务的高可用。从逻辑功能而言， pacemaker在集群管理员所定义的资源规则驱动下，负责集群中软件服务的全生命周期管理，这种管理甚至包括整个软件系统以及软件系统彼此之间的交互。 Pacemaker在实际应用中可以管理任何规模的集群，由于其具备强大的资源依赖模型，这使得集群管理员能够精确描述和表达集群资源之间的关系（包括资源的顺序和位置等关系）。

网络环境

节点

node1(192.168.10.222)

node2(192.168.10.222)

注意以下操作是两台节点都要操作的

2.先给虚拟机上网，在本地网络文件中添加DNS1=114.114.114.114.

本地centos 8 缺少安装pacemaker的安装环境，需要使用网络源进行环境补充。
这里使用的是阿里云网络源：https://developer.aliyun.com/mirror/centos?spm=a2c6h.13651102.0.0.3e221b11RpzwRI

3.搭建阿里云网络源，并添加pacemaker附属源


[root@localhost ~]# curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-8.repo
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2595  100  2595    0     0  32848      0 --:--:-- --:--:-- --:--:-- 32848
[root@localhost ~]# cd /etc/yum.repos.d/
[root@localhost yum.repos.d]# vim ./pacemaker.repo
[pacemaker]
name=pacemaker
baseurl=https://mirrors.tuna.tsinghua.edu.cn/centos/8.5.2111/HighAvailability/x86_64/os/
gpgcheck=1
enabled=1
[root@localhost yum.repos.d]# yum clean all 
37 files removed
[root@localhost yum.repos.d]# yum list
如果能正常加载pacemaker的yum列表，就说明已经配置完成。

3.安装软件包

 yum install -y pacemaker corosync  pcs psmisc policycoreutils-python-utils httpd bash-comp*

[root@localhost yum.repos.d]# source /etc/profile.d/bash_completion.sh  //pcs代码补全
[root@localhost yum.repos.d]#

4.关闭防火墙，添加hosts解析,并给hacluster设置密码，两台机要设置成一样的.

[root@localhost yum.repos.d]# vim /etc/hosts 
192.168.10.222 node1
192.168.10.223 node2
[root@localhost yum.repos.d]# systemctl stop firewalld 
[root@localhost yum.repos.d]# setenforce 0
[root@localhost yum.repos.d]# passwd hacluster 
Changing password for user hacluster.
New password: 
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: 
passwd: all authentication tokens updated successfully.
[root@localhost yum.repos.d]# systemctl restart pcsd

接下来可以只在一台主机上做，因为集群会自动同步

5.创建集群，和集群基础操作

[root@localhost yum.repos.d]# pcs host auth node1 node2  //集群认证
Username: hacluster
Password: 
node2: Authorized
node1: Authorized
[root@localhost yum.repos.d]# pcs cluster setup mycluster node1 node2 //创建一个叫mycluster 的集群    
No addresses specified for host 'node1', using 'node1'
No addresses specified for host 'node2', using 'node2'
Destroying cluster on hosts: 'node1', 'node2'...
node1: Successfully destroyed cluster
node2: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'node1', 'node2'
node1: successful removal of the file 'pcsd settings'
node2: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'node1', 'node2'
node1: successful distribution of the file 'corosync authkey'
node1: successful distribution of the file 'pacemaker authkey'
node2: successful distribution of the file 'corosync authkey'
node2: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'node1', 'node2'
node1: successful distribution of the file 'corosync.conf'
node2: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
[root@localhost yum.repos.d]# pcs cluster start --all  //全部启动集群
node1: Starting Cluster...
node2: Starting Cluster...
[root@localhost yum.repos.d]# corosync-cfgtool -s   //检验集群
Local node ID 1, transport knet  
LINK ID 0 udp
        addr    = 192.168.10.222
        status:
                nodeid:          1:     localhost
                nodeid:          2:     connected
[root@localhost yum.repos.d]# crm_verify -L -V  //检查配置并进行错误修改：
(unpack_resources)      error: Resource start-up disabled since no STONITH resources have been defined
(unpack_resources)      error: Either configure some or disable STONITH with the stonith-enabled option
(unpack_resources)      error: NOTE: Clusters with shared data need STONITH to ensure data integrity
crm_verify: Errors found during check: config not valid
[root@localhost yum.repos.d]# pcs property set stonith-enabled=false
[root@localhost yum.repos.d]#  crm_verify -L -V
默认情况下，fence是没有经行配置的，所以会产生错误。fence的作用就是解决节点故障。可以使用命令屏蔽掉fence后，错误就会消失。

在集群创建以后，我们可以访问一个https://192.168.10.222:2224/manage的网站，这个是集群网页版，访问成功也就说明集群创建没有问题。(访问的ip地址可以是节点中的任意一个)

用hacluster登陆即可。

6.Pacemaker 配置资源 （http,ip）

开启 apache status url 监控页


[root@localhost yum.repos.d]# vim /etc/httpd/conf.d/status.conf
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from all
</Location>
用scp传到第二台机
[root@localhost yum.repos.d]# scp /etc/httpd/conf.d/status.conf root@192.168.10.223:/etc/httpd/conf.d/
The authenticity of host '192.168.10.223 (192.168.10.223)' can't be established.
ECDSA key fingerprint is SHA256:yxMbS70NHbXhHD0kCuBb456Q5dvup6t5Yyh75gFjskQ.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.10.223' (ECDSA) to the list of known hosts.
root@192.168.10.223's password: 
status.conf                                                                       100%  110   143.8KB/s   00:00    
[root@localhost yum.repos.d]#

在创建资源前，要先把web服务器启动，看看有没有报错，在看看http://192.168.10.222/server-status这个页面能不能正常访问，两个节点的都要看，两个都能启动成功后，两太台节点的httpd服务关闭，等会创建资源的时候他会自动开启httpd资源的。

创建web和vip资源


[root@localhost yum.repos.d]# pcs resource create WebSite ocf:heartbeat:apache configfile=/etc/httpd/conf/httpd.conf statusurl="http://localhost/server-status" op monitor interval=30s
\\添加Apache服务,在所有的节点上安装Apache服务，但不用启动服务，集群会根据资源情况自动开启(这个是测试用的)
[root@localhost yum.repos.d]# pcs resource create VIP ocf:heartbeat:IPaddr2 ip=192.168.10.225 cidr_netmask=32 op monitor interval=30s
\\这个固定的IP地址取名为VIP。并且设定为集群每30秒检查它一次。注意：选择的IP地址不能被节点所占用
对集群进行约束确保资源在同一节点

pcs status 查看节点状态和资源状态

可以看到现在web资源和vip资源不在同一个节点上，违背了我们一开始的搭建这个集群的理由，所以我们要把两个资源约束在一起。

[root@localhost yum.repos.d]# pcs constraint colocation add WebSite with VIP INFINITY
[root@localhost conf.d]#  pcs status 
Cluster name: mycluster
Cluster Summary:
  * Stack: corosync
  * Current DC: node1 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
  * Last updated: Mon Dec 13 21:36:44 2021
  * Last change:  Mon Dec 13 21:36:40 2021 by root via cibadmin on node1
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ node1 node2 ]

Full List of Resources:
  * VIP (ocf::heartbeat:IPaddr2):        Started node1
  * WebSite     (ocf::heartbeat:apache):         Starting node1

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled
[root@localhost conf.d]#

用集群ip去访问192.168.10.225/server.status,访问成功。

将集群node1停止查看，节点是否会自动跳转到node2，并且页面能接着访问。

[root@localhost conf.d]# pcs cluster stop node1
node1: Stopping Cluster (pacemaker)...
node1: Stopping Cluster (corosync)...
[root@localhost conf.d]# pcs status 
Error: error running crm_mon, is pacemaker running?
  crm_mon: Error: cluster is not available on this node
//此时node1节点已经停止，所以在node1无法使用pcs指令，要在node2上执行查看是否成功
[root@localhost yum.repos.d]# pcs status 
Cluster name: mycluster
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
  * Last updated: Mon Dec 13 21:50:51 2021
  * Last change:  Mon Dec 13 21:36:40 2021 by root via cibadmin on node1
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ node2 ]
  * OFFLINE: [ node1 ]

Full List of Resources:
  * VIP (ocf::heartbeat:IPaddr2):        Started node2
  * WebSite     (ocf::heartbeat:apache):         Started node2

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled
[root@localhost yum.repos.d]#

将node1重新开启，开启之后不会将节点自动跳回node1

[root@localhost yum.repos.d]# pcs cluster start node1
node1: Starting Cluster...
[root@localhost yum.repos.d]# pcs status 
Cluster name: mycluster
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
  * Last updated: Mon Dec 13 21:52:33 2021
  * Last change:  Mon Dec 13 21:36:40 2021 by root via cibadmin on node1
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ node1 node2 ]

Full List of Resources:
  * VIP (ocf::heartbeat:IPaddr2):        Started node2
  * WebSite     (ocf::heartbeat:apache):         Started node2

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled
[root@localhost yum.repos.d]#

到这基本就已经达到效果了。

知识补充：

.添加约束确保资源在同一节点： pcs constraint colocation add WebSite with VIP INFINITY
    先后顺序:pcs constraint order VIp then apache
   优先级:pcs constraint location Website prefers node1=50
   pcs constraint location Website prefers node2=100
删除资源 pcs resource delete apache
查看集群状态：#pcs status
查看集群当前配置：#pcs config
开机后集群自启动：#pcs cluster enable –all
启动集群：#pcs cluster start –all
查看集群资源状态：#pcs resource show
验证集群配置情况：#crm_verify -L -V
测试资源配置：#pcs resource debug-start resource
设置节点为备用状态：#pcs cluster standby node1
corosync-cfgtool -s #用corosync-cfgtool检查集群通信是否顺畅

配置资源：

配置FileSystem
[shell]# pcs resource create WebFS ocf:heartbeat:Filesystem \
device="/dev/sdb1" directory="/var/www/html" fstype="ext4"

[shell]# pcs resource create WebFS ocf:heartbeat:Filesystem \
device="-U 32937d65eb" directory="/var/www/html" fstype="ext4"

配置Iscsi
[shell]# pcs resource create WebData ocf:heartbeat:iscsi \
portal="192.168.10.18" target="iqn.2008-08.com.starwindsoftware:" \
op monitor depth="0" timeout="30" interval="120"

[shell]# pcs resource create WebFS ocf:heartbeat:Filesystem \
device="-U 32937d65eb" directory="/var/www/html" fstype="ext4" options="_netdev"