ITPub博客

首页 > 数据库 > MySQL > mysql MHA配置及三种切换方式演练

mysql MHA配置及三种切换方式演练

原创 MySQL 作者:水逸冰 时间:2018-09-17 16:45:12 0 删除 编辑

master节点/MHA管理节点:172.31.217.183
slave节点/MHA成员节点:172.31.217.182
已开启半同步。

数据库版本为5.7

配置免密码登录
master节点:
root@bd-dev-mingshuo-183:/opt/soft#ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
36:39:6b:1e:40:f2:85:31:db:d0:3e:ab:05:0e:fd:37 root@bd-dev-mingshuo-183
The key's randomart image is:
+--[ RSA 2048]----+
|      +.         |
|       B.        |
|    ..+.o        |
|    .+o.o.       |
|     oooSo       |
|      .o++E      |
|       o+. .     |
|      .o .       |
|        .        |
+-----------------+
root@bd-dev-mingshuo-183:/opt/soft#ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.217.182
root@172.31.217.182's password:
Now try logging into the machine, with "ssh 'root@172.31.217.182'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

root@bd-dev-mingshuo-183:/u01#ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.217.183
root@172.31.217.183's password:
Now try logging into the machine, with "ssh 'root@172.31.217.183'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

slave节点:
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.217.183
ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.217.182

slave节点:
mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like 'read_only'\G
*************************** 1. row ***************************
Variable_name: read_only
        Value: ON
1 row in set (0.00 sec)

read_only为1代表是只读,0代表读写。从库只读不会影响slave的日志应用。但是不要把参数写入参数文件,因为可能当这个slave切换为master就会造成普通用户不能写入。当然这个参数在配置mha过程中是可选的。

部署安装包
manager节点安装manager包
所有节点安装node包
先安装node包
rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
yum install mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

在master上创建mha管理账号
grant all privileges on *.* to mha@'172.31.217.%' identified by 'oracle';
flush privileges;

创建目录,用于存放mha配置文件和mha日志
mkdir -p /u01/mha/log
chown mysql.mysql -R mha

编辑配置文件
vi /u01/mha/mha.cnf

[server default]
manager_log=/u01/mha/log/manager.log
manager_workdir=/u01/mha/log

master_binlog_dir=/u01/mysql/3306/data
user=mha
password=oracle
ping_interval=2  
repl_user=repl_user
repl_password=oracle
ssh_user=root

[server1]
hostname=172.31.217.183
port=3306

[server2]
hostname=172.31.217.182
port=3306

配置文件可选参数:
[server default]模块:
ping_interval=1         //设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover
remote_workdir=/tmp     //设置远端mysql在发生切换时binlog的保存位置
report_script=/usr/local/send_report    //设置发生切换后发送的报警的脚本          
shutdown_script=""      //设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用)


从库模块:
candidate_master=1   //设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave
check_repl_delay=0   //默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master

检测同步及ssh登录
masterha_check_ssh --conf=/u01/mha/mha.cnf
masterha_check_repl --conf=/u01/mha/mha.cnf

中间报了很多次错,部分解决方案:
ln -s /opt/mysql-5.7.23/bin/mysql /usr/bin/mysql
ln -s /opt/mysql-5.7.23/bin/mysqlbinlog /usr/bin/mysqlbinlog
卸载mha4mysql-manager-0.58-0.el7.centos.noarch.rpm,安装mha4mysql-manager-0.56-0.el6.noarch.rpm

启动mha
nohup masterha_manager --conf=/u01/mha/mha.cnf > /u01/mha/log/manager.log 2>&1 &

检查mha状态
root@bd-dev-mingshuo-183:/opt/soft#masterha_check_status --conf=/u01/mha/mha.cnf
mha (pid:24910) is running(0:PING_OK), master:172.31.217.183


配置VIP
在server default模块下面添加
master_ip_failover_script=/usr/local/bin/master_ip_failover

从源码包中将master_ip_failover拷贝到/usr/local/bin/下面
cd /opt/soft/MHAsoft/mha4mysql-manager-0.56/samples/scripts
cp -ra master_ip_failover /usr/local/bin/master_ip_failover

修改/usr/local/bin/master_ip_failover
my $vip = '172.31.217.203/24';  #此处为你要设置的虚拟ip
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth2:$key $vip"; #此处改为你的网卡名称
my $ssh_stop_vip = "/sbin/ifconfig eth2:$key down";


注:
my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

将上面内容添加到这里

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);


配置网卡VIP
ifconfig eth2:1 172.31.217.203/24

ifconfig
eth2      Link encap:Ethernet  HWaddr 54:0F:5D:2C:4D:77  
          inet addr:172.31.217.202  Bcast:172.31.217.255  Mask:255.255.255.0
          inet6 addr: fe80::560f:5dff:fe2c:4d77/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:74742667 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:52680755472 (49.0 GiB)  TX bytes:740 (740.0 b)

eth2:1    Link encap:Ethernet  HWaddr 54:0F:5D:2C:4D:77  
          inet addr:172.31.217.203  Bcast:172.31.217.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

停止mha
masterha_stop --conf=/u01/mha/mha.cnf

再次开启mha
nohup masterha_manager --conf=/u01/mha/mha.cnf > /u01/mha/log/manager.log 2>&1 &

报错:
Bareword "FIXME_xxx" not allowed while "strict subs" in use at /usr/local/bin/master_ip_failover line 98.
Execution of /usr/local/bin/master_ip_failover aborted due to compilation errors.
Mon Sep 17 10:56:04 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln226]  Failed to get master_ip_failover_script status with return code 255:0.
Mon Sep 17 10:56:04 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations.  at /usr/bin/masterha_manager line 50
Mon Sep 17 10:56:04 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Mon Sep 17 10:56:04 2018 - [info] Got exit code 1 (Not master dead).

直接把FIXME_xxx相关行注释掉算了。

再次开启mha
nohup masterha_manager --conf=/u01/mha/mha.cnf > /u01/mha/log/manager.log 2>&1 &
ok!

关闭主库
mysqladmin -uroot -poracle shutdown

检查备库
mysql> show slave status;
Empty set (0.00 sec)

mysql> show master status\G
*************************** 1. row ***************************
             File: slave-relay-bin.000002
         Position: 154
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)
备库已经自动切成了主库。停掉的主库上面的mha软件也自动停止了。


恢复之前的主从关系:
现在拉起停掉的主库,会发现主库没有主动加入到集群中去。
主库查询日志位置:
mysql> show master status\G
*************************** 1. row ***************************
             File: master-bin.000005
         Position: 154
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)
备库:
change master to
master_host='bd-dev-mingshuo-183',
master_port=3306,
master_user='repl_user',
master_password='oracle',
master_log_file='master-bin.000005',
master_log_pos=154;

start slave;



主库启用mha软件,注意这里要加-ignore_last_failover参数,否则会报错:
Mon Sep 17 14:45:56 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 17 14:45:56 2018 - [info] Reading application default configuration from /u01/mha/mha.cnf..
Mon Sep 17 14:45:56 2018 - [info] Reading server configuration from /u01/mha/mha.cnf..
Mon Sep 17 14:45:56 2018 - [info] MHA::MasterMonitor version 0.56.
Mon Sep 17 14:45:56 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln193] There is no alive slave. We can't do failover
Mon Sep 17 14:45:56 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 326
Mon Sep 17 14:45:56 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Mon Sep 17 14:45:56 2018 - [info] Got exit code 1 (Not master dead).

开启mha软件:
nohup masterha_manager -ignore_last_failover --conf=/u01/mha/mha.cnf > /u01/mha/log/manager.log 2>&1 &

上面是自动failover的过程,后面再来测试一下手动failover
停止mha manager:
masterha_stop --conf=/u01/mha/mha.cnf

停止master数据库
mysqladmin -uroot -poracle shutdown

手动切换
masterha_master_switch --master_state=dead --conf=/u01/mha/mha.cnf --dead_master_host=172.31.217.183 --dead_master_port=3306 --new_master_host=172.31.217.182  --new_master_port=3306 --ignore_last_failover



上面是自动failover的过程,后面再来测试一下在线切换:
manager节点:
停止mha manager:
masterha_stop --conf=/u01/mha/mha.cnf
masterha_master_switch --conf=/u01/mha/mha.cnf  --master_state=alive --new_master_host=172.31.217.182 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=100
Mon Sep 17 15:47:29 2018 - [info] MHA::MasterRotate version 0.56.
Mon Sep 17 15:47:29 2018 - [info] Starting online master switch..
Mon Sep 17 15:47:29 2018 - [info]
Mon Sep 17 15:47:29 2018 - [info] * Phase 1: Configuration Check Phase..
Mon Sep 17 15:47:29 2018 - [info]
Mon Sep 17 15:47:29 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 17 15:47:29 2018 - [info] Reading application default configuration from /u01/mha/mha.cnf..
Mon Sep 17 15:47:29 2018 - [info] Reading server configuration from /u01/mha/mha.cnf..
Mon Sep 17 15:47:29 2018 - [info] GTID failover mode = 0
Mon Sep 17 15:47:29 2018 - [info] Current Alive Master: 172.31.217.183(172.31.217.183:3306)
Mon Sep 17 15:47:29 2018 - [info] Alive Slaves:
Mon Sep 17 15:47:29 2018 - [info]   172.31.217.182(172.31.217.182:3306)  Version=5.7.23-log (oldest major version between slaves) log-bin:enabled
Mon Sep 17 15:47:29 2018 - [info]     Replicating from bd-dev-mingshuo-183(172.31.217.183:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 172.31.217.183(172.31.217.183:3306)? (YES/no): YES
Mon Sep 17 15:47:33 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Sep 17 15:47:33 2018 - [info]  ok.
Mon Sep 17 15:47:33 2018 - [info] Checking MHA is not monitoring or doing failover..
Mon Sep 17 15:47:33 2018 - [info] Checking replication health on 172.31.217.182..
Mon Sep 17 15:47:33 2018 - [info]  ok.
Mon Sep 17 15:47:33 2018 - [info] 172.31.217.182 can be new master.
Mon Sep 17 15:47:33 2018 - [info]
From:
172.31.217.183(172.31.217.183:3306) (current master)
 +--172.31.217.182(172.31.217.182:3306)

To:
172.31.217.182(172.31.217.182:3306) (new master)
 +--172.31.217.183(172.31.217.183:3306)

Starting master switch from 172.31.217.183(172.31.217.183:3306) to 172.31.217.182(172.31.217.182:3306)? (yes/NO): yes
Mon Sep 17 15:47:55 2018 - [info] Checking whether 172.31.217.182(172.31.217.182:3306) is ok for the new master..
Mon Sep 17 15:47:55 2018 - [info]  ok.
Mon Sep 17 15:47:55 2018 - [info] 172.31.217.183(172.31.217.183:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Sep 17 15:47:55 2018 - [info] 172.31.217.183(172.31.217.183:3306): Resetting slave pointing to the dummy host.
Mon Sep 17 15:47:55 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Sep 17 15:47:55 2018 - [info]
Mon Sep 17 15:47:55 2018 - [info] * Phase 2: Rejecting updates Phase..
Mon Sep 17 15:47:55 2018 - [info]
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Mon Sep 17 15:48:32 2018 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Sep 17 15:48:32 2018 - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Sep 17 15:48:32 2018 - [info]  ok.
Mon Sep 17 15:48:32 2018 - [info] Orig master binlog:pos is master-bin.000007:154.
Mon Sep 17 15:48:32 2018 - [info]  Waiting to execute all relay logs on 172.31.217.182(172.31.217.182:3306)..
Mon Sep 17 15:48:32 2018 - [info]  master_pos_wait(master-bin.000007:154) completed on 172.31.217.182(172.31.217.182:3306). Executed 0 events.
Mon Sep 17 15:48:32 2018 - [info]   done.
Mon Sep 17 15:48:32 2018 - [info] Getting new master's binlog name and position..
Mon Sep 17 15:48:32 2018 - [info]  slave-relay-bin.000002:154
Mon Sep 17 15:48:32 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.31.217.182', MASTER_PORT=3306, MASTER_LOG_FILE='slave-relay-bin.000002', MASTER_LOG_POS=154, MASTER_USER='repl_user', MASTER_PASSWORD='xxx';
Mon Sep 17 15:48:32 2018 - [info] Setting read_only=0 on 172.31.217.182(172.31.217.182:3306)..
Mon Sep 17 15:48:32 2018 - [info]  ok.
Mon Sep 17 15:48:32 2018 - [info]
Mon Sep 17 15:48:32 2018 - [info] * Switching slaves in parallel..
Mon Sep 17 15:48:32 2018 - [info]
Mon Sep 17 15:48:32 2018 - [info] Unlocking all tables on the orig master:
Mon Sep 17 15:48:32 2018 - [info] Executing UNLOCK TABLES..
Mon Sep 17 15:48:32 2018 - [info]  ok.
Mon Sep 17 15:48:32 2018 - [info] Starting orig master as a new slave..
Mon Sep 17 15:48:32 2018 - [info]  Resetting slave 172.31.217.183(172.31.217.183:3306) and starting replication from the new master 172.31.217.182(172.31.217.182:3306)..
Mon Sep 17 15:48:32 2018 - [info]  Executed CHANGE MASTER.
Mon Sep 17 15:48:32 2018 - [info]  Slave started.
Mon Sep 17 15:48:32 2018 - [info] All new slave servers switched successfully.
Mon Sep 17 15:48:32 2018 - [info]
Mon Sep 17 15:48:32 2018 - [info] * Phase 5: New master cleanup phase..
Mon Sep 17 15:48:32 2018 - [info]
Mon Sep 17 15:48:32 2018 - [info]  172.31.217.182: Resetting slave info succeeded.
Mon Sep 17 15:48:32 2018 - [info] Switching master to 172.31.217.182(172.31.217.182:3306) completed successfully.

注意切换过程中会有一个地方询问你
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
没有disable主库的写入,切换之后连接这的应用程序会继续往里面写入,这样ok吗?
这里我只是测试这个在线切换的过程的可用性,所以输入了yes。
切换完成之后mha软件暂停了。

附:
Manager工具包主要包括以下几个工具:
masterha_check_ssh              检查MHA的SSH配置状况
masterha_check_repl             检查MySQL复制状况
masterha_manger                 启动MHA
masterha_check_status           检测当前MHA运行状态
masterha_master_monitor         检测master是否宕机
masterha_master_switch          控制故障转移(自动或者手动)
masterha_conf_host              添加或删除配置的server信息


Node工具包:
save_binary_logs                保存和复制master的二进制日志
apply_diff_relay_logs           识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog              去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs                清除中继日志(不会阻塞SQL线程)

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/31480688/viewspace-2214323/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
精通oracle,mysql和linux,热衷于研究数据库,擅长shell和Python自动化运维。VX:18302174682

注册时间:2017-08-05

  • 博文量
    104
  • 访问量
    119490