ITPub博客

首页 > 数据库 > MySQL > reset master导致主从GTID不一致的处理方法

reset master导致主从GTID不一致的处理方法

原创 MySQL 作者:nwpu053871 时间:2016-04-01 16:12:19 0 删除 编辑
1、 主库报同步异常(主从是双向复制的),报错信息显示insert语句的目标表不存在
2、 检查主从库的GTID信息发现,主库的GTID比从库的高(应该是从库做过reset master的操作,重置了GTID),导致有些事务被跳过,从而引发报错。这里被跳过的是建表语句。
3、 处理方法是推高从库的GTID到主库的水平,从而消除冲突,保证从库过来的事务都可以在主库执行。
4、 第二天检查发现还是有些事务被自动跳过,被跳过的是创建临时表的语句。查询文档得知:临时表的创建操作不会同步到备库执行,而临时表的相关操作事务,也会代之以空事务。

下面是详细处理过程:
有个主库最近同步从库信息经常报错,跳过对应的事务之后第二天还是有同样的问题,报错如下:
Last_Error: Error 'Table 'mysql.ibbackup_binlog_marker' doesn't exist' on query. Default database: 'mysql'. Query: 'INSERT INTO ibbackup_binlog_marker VALUES(1)'

其中一次详细信息如下:
mysql> show slave status \G
*************************** 1. row ***************************
<省略部分信息>
        Relay_Master_Log_File: mysql-bin.000134
             Slave_IO_Running: Yes
                   Last_Errno: 1146
                   Last_Error: Error 'Table 'mysql.ibbackup_binlog_marker' doesn't exist' on query. Default database: 'mysql'. Query: 'INSERT INTO ibbackup_binlog_marker VALUES(1)'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1534712
              Relay_Log_Space: 8079
               Last_SQL_Errno: 1146
               Last_SQL_Error: Error 'Table 'mysql.ibbackup_binlog_marker' doesn't exist' on query. Default database: 'mysql'. Query: 'INSERT INTO ibbackup_binlog_marker VALUES(1)'
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 3809805896
                 Master_UUID: 44d3c576-ca26-11e3-a90a-a0369f38458a
             Master_Info_File: mysql.slave_master_info
           Retrieved_Gtid_Set: 44d3c576-ca26-11e3-a90a-a0369f38458a:28-1420
           Executed_Gtid_Set: 258cacac-ca16-11e3-a8a1-a0369f35d966:1-160657756,
44d3c576-ca26-11e3-a90a-a0369f38458a:1-5:7-19:21-1294:1296-1307:1309-1320:1322-1333:1335-1346:1348-1359:1361-1372:1374-1412:1414-1425:1427-1438:1440-1451:1453-1464:1466-1478:1480-1491:1493-1504:1506-1517:1519-1530:1532-1543:1545-1556:1558-1570:1572-1583:1585-1596:1598-1609:1611-1622:1624-1635:1637-1648:1650-1662:1664-1675:1677-1688:1690-1701:1703-1714:1716-1727:1729-1740:1742-1754:1756-1767:1769-1780:1782-1793:1795-1806:1808-1819:1821-1832:1834-1846:1848-1859:1861-1872:1874-1885:1887-1898:1900-1911:1913-1924:1926-1938:1940-1951:1953-1964:1966-1977:1979-1990:1992-2003:2005-2016:2018-2030:2032-2043:2045-2056:2058-2069:2071-2082:2084-2095:2097-2108:2110-2122:2124-2135:2137-2148:2150-2161:2163-2174:2176-2187:2189-2200:2202-2214:2216-2227:2229-2240:2242-2253:2255-2266:2268-2279:2281-2292:2294-2306:2308-2319:2321-2332:2334-2345:2347-2358:2360-2371:2373-2384:2386-2398:2400-2411:2413-2424:2426-2437:2439-2450:2452-2463:2465-2476:2478-2490:2492-2503:2505-2516:2518-2529:2531-2542:2544-2555:2557-2568:2570-2582:2584-2595:2597-2608:2610-2621:2623-2634

报错的是一个insert语句,但目标表找不到。这个语句在从库的mysql-bin.000134 日志上,Exec_Master_Log_Pos是1534712。

于是到从库检查对应binlog,在对应位置可以看到,insert语句的GTID是xxx:1413,而在insert语句之前就是建表语句,对应的GTID是1412:
# at 1534324
#160301  0:21:00 server id 3809805896  end_log_pos 1534372 CRC32 0x9cf47996     GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '44d3c576-ca26-11e3-a90a-a0369f38458a:1411'/*!*/;
# at 1534372
#160301  0:21:00 server id 3809805896  end_log_pos 1534519 CRC32 0x0e9481da     Query   thread_id=947514        exec_time=0     error_code=0
SET TIMESTAMP=1456762860/*!*/;
DROP TABLE IF EXISTS `ibbackup_binlog_marker` /* generated by server */
/*!*/;
# at 1534519
#160301  0:21:00 server id 3809805896  end_log_pos 1534567 CRC32 0xd09025a2     GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '44d3c576-ca26-11e3-a90a-a0369f38458a:1412'/*!*/;
# at 1534567
#160301  0:21:00 server id 3809805896  end_log_pos 1534712 CRC32 0x04118941     Query   thread_id=947514        exec_time=0     error_code=0
SET TIMESTAMP=1456762860/*!*/;
SET @@session.pseudo_thread_id=947514/*!*/;
CREATE TEMPORARY TABLE ibbackup_binlog_marker (a INT) ENGINE = INNODB
/*!*/;
# at 1534712
#160301  0:21:00 server id 3809805896  end_log_pos 1534760 CRC32 0x70ca2249     GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '44d3c576-ca26-11e3-a90a-a0369f38458a:1413'/*!*/;
# at 1534760
#160301  0:21:00 server id 3809805896  end_log_pos 1534841 CRC32 0x5dad491c     Query   thread_id=947514        exec_time=0     error_code=0
SET TIMESTAMP=1456762860/*!*/;
BEGIN
/*!*/;
# at 1534841
#160301  0:21:00 server id 3809805896  end_log_pos 1534961 CRC32 0xb5f7ab48     Query   thread_id=947514        exec_time=0     error_code=0
SET TIMESTAMP=1456762860/*!*/;
INSERT INTO ibbackup_binlog_marker VALUES(1)
/*!*/;
# at 1534961
#160301  0:21:00 server id 3809805896  end_log_pos 1535035 CRC32 0x9baea0da     Query   thread_id=947514        exec_time=0     error_code=0
SET TIMESTAMP=1456762860/*!*/;
COMMIT
/*!*/;

而检查主库的relaylog时可以发下,这些事务都是有成功同步过来的,并没有丢失。那为什么会提示表不存在的错误呢?是有什么原因导致建表语句没有执行?

我们先看一下主库的GTID,从从库同步过来的已执行的最大事务号是7383:
*************************** 1. row ***************************
Variable_name: enforce_gtid_consistency
        Value: ON
*************************** 2. row ***************************
Variable_name: gtid_executed
        Value: 258cacac-ca16-11e3-a8a1-a0369f35d966:1-160665110,
44d3c576-ca26-11e3-a90a-a0369f38458a:1-5:7-19:21-1294:1296-1307:1309-1320:1322-1333:1335-1346:1348-1359:1361-1372:1374-1412:1414-1425:1427-1438:1440-1451:1453-1464:1466-1478:1480-1491:1493-1504:1506-1517:1519-1530:1532-1543:1545-1556:1558-1570:1572-1583:1585-1596:1598-1609:1611-1622:1624-1635:1637-1648:1650-1662:1664-1675:1677-1688:1690-1701:1703-1714:1716-1727:1729-1740:1742-1754:1756-1767:1769-1780:1782-1793:1795-1806:1808-1819:1821-1832:1834-1846:1848-1859:1861-1872:1874-1885:1887-1898:1900-1911:1913-1924:1926-1938:1940-1951:1953-1964:1966-1977:1979-1990:1992-2003:2005-2016:2018-2030:2032-2043:2045-2056:2058-2069:2071-2082:2084-2095:2097-2108:2110-2122:2124-2135:2137-2148:2150-2161:2163-2174:2176-2187:2189-2200:2202-2214:2216-2227:2229-2240:2242-2253:2255-2266:2268-2279:2281-2292:2294-2306:2308-2319:2321-2332:2334-2345:2347-2358:2360-2371:2373-2384:2386-2398:2400-2411:2413-2424:2426-2437:2439-2450:2452-2463:2465-2476:2478-2490:2492-2503:2505-2516:2518-7383:


再看一下从库,发现最大只去到1420!
*************************** 1. row ***************************
Variable_name: enforce_gtid_consistency
        Value: ON
*************************** 2. row ***************************
Variable_name: gtid_executed
        Value: 258cacac-ca16-11e3-a8a1-a0369f35d966:109528488-160665725,
44d3c576-ca26-11e3-a90a-a0369f38458a:1-1420

问题来了:为什么从库的GTID会比主库的小呢?应该是有同学在从库做了reset master的操作,重置了GTID导致的!

我们知道,备库在应用主库的日志时,会通过检查GTID来确定对应的事务是否已经执行过。如果对应的GTID已经在备库中有记录,说明对应事务已经执行过,不必再重复执行。反之则需要执行。
从上面主库的GTID记录可以看到,GTID:1412已经执行过了。但我们可以大胆推测,此已经执行的1412非上面从库的binlog对应的1412,也就是说,这是很久以前用过的GTID。当从库的这个新的1412号事务(建表语句)同步到主库时,主库发现这个GTID已经有执行记录,所以就跳过直接就去执行1413号insert语句,从而引发了 Table doesn't exist的错误。

找到了原因就可以确定解决方案:
由于这是从从库同步到主库的信息,而从库是只读的,只会产生备份的临时信息。因此这些事务都可以跳过。但由于从库的GTID比主库上的小,跳过了1413号事务,后面还会有更多的类似的问题。因此还需要在从库上将事务号推高到跟主库一致的水平,这样后面从库产生的事务号都不会再跟主库上的有冲突。推高事务号可以通过执行空事务达到:
set gtid_next='44d3c576-ca26-11e3-a90a-a0369f38458a:1421;
begin;commit;
set gtid_next='44d3c576-ca26-11e3-a90a-a0369f38458a:1422;
begin;commit;
…………………………………
set gtid_next='44d3c576-ca26-11e3-a90a-a0369f38458a:7383;
begin;commit;

set gtid_next='AUTOMATIC';

同时也在主库上做同样的处理,目的是补全上面跳过的事务产生的“空隙”,处理完后就可以auto_position设置为1了:
mysql> show slave status\G
*************************** 1. row ***************************
<省略部分信息>
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Retrieved_Gtid_Set: 44d3c576-ca26-11e3-a90a-a0369f38458a:7384-7396
            Executed_Gtid_Set: 258cacac-ca16-11e3-a8a1-a0369f35d966:1-161808236,
44d3c576-ca26-11e3-a90a-a0369f38458a:1-7383
                Auto_Position: 1
1 row in set (0.00 sec)


至此,问题处理完毕。

第二天复核,发现同步正常,但是主库应用的GTID又有跳过的现象,7388号事务在主库没有执行:
mysql> show slave status\G
*************************** 1. row ***************************
<省略部分信息>
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Retrieved_Gtid_Set: 44d3c576-ca26-11e3-a90a-a0369f38458a:7384-7396
            Executed_Gtid_Set: 258cacac-ca16-11e3-a8a1-a0369f35d966:1-161808236,
44d3c576-ca26-11e3-a90a-a0369f38458a:1-7387:7389-7396
                Auto_Position: 1
1 row in set (0.00 sec)

检查主库relaylog,发现7388、7389这两个事务主库是有获取到的,但7388看起来跳过没执行,为什么7389这次执行却没报错呢?
/*!*/;
# at 2300
#160303  3:06:55 server id 3809805896  end_log_pos 7759399 CRC32 0x844b31b3     GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '44d3c576-ca26-11e3-a90a-a0369f38458a:7388'/*!*/;
# at 2348
#160303  3:06:55 server id 3809805896  end_log_pos 7759544 CRC32 0x19df06b2     Query   thread_id=966014        exec_time=0     error_code=0
SET TIMESTAMP=1456945615/*!*/;
CREATE TEMPORARY TABLE ibbackup_binlog_marker (a INT) ENGINE = INNODB
/*!*/;
# at 2493
#160303  3:06:55 server id 3809805896  end_log_pos 7759592 CRC32 0x0641a4d4     GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '44d3c576-ca26-11e3-a90a-a0369f38458a:7389'/*!*/;
# at 2541
#160303  3:06:55 server id 3809805896  end_log_pos 7759673 CRC32 0x76269211     Query   thread_id=966014        exec_time=0     error_code=0
SET TIMESTAMP=1456945615/*!*/;
BEGIN
/*!*/;
# at 2622
#160303  3:06:55 server id 3809805896  end_log_pos 7759793 CRC32 0x7627ea96     Query   thread_id=966014        exec_time=0     error_code=0
SET TIMESTAMP=1456945615/*!*/;
INSERT INTO ibbackup_binlog_marker VALUES(1)
/*!*/;
# at 2742
#160303  3:06:55 server id 3809805896  end_log_pos 7759867 CRC32 0xb23b3a62     Query   thread_id=966014        exec_time=0     error_code=0
SET TIMESTAMP=1456945615/*!*/;
COMMIT

检查主库的binlog。
可以发现:7388的确是被跳过了,而7389这个本该是insert的却变成了空事务!
# at 192527598
#160303  3:06:55 server id 3809805896  end_log_pos 192527646 CRC32 0x68163992   GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '44d3c576-ca26-11e3-a90a-a0369f38458a:7387'/*!*/;
# at 192527646
#160303  3:06:55 server id 3809805896  end_log_pos 192527793 CRC32 0xf6c14291   Query   thread_id=966014        exec_time=0     error_code=0
use `mysql`/*!*/;
SET TIMESTAMP=1456945615/*!*/;
DROP TABLE IF EXISTS `ibbackup_binlog_marker` /* generated by server */
/*!*/;
# at 192527793
#160303  3:06:55 server id 3809805896  end_log_pos 192527841 CRC32 0x092b349d   GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '44d3c576-ca26-11e3-a90a-a0369f38458a:7389'/*!*/;
# at 192527841
#160303  3:06:55 server id 3809805896  end_log_pos 192527914 CRC32 0x2df7ca6b   Query   thread_id=966014        exec_time=0     error_code=0
SET TIMESTAMP=1456945615/*!*/;
BEGIN
/*!*/;
# at 192527914
#160303  3:06:55 server id 3809805896  end_log_pos 192527988 CRC32 0x1f722f7c   Query   thread_id=966014        exec_time=0     error_code=0
SET TIMESTAMP=1456945615/*!*/;
COMMIT
/*!*/;
# at 192527988
#160303  3:06:55 server id 3809805896  end_log_pos 192528036 CRC32 0xf9656531   GTID [commit=yes]
SET @@SESSION.GTID_NEXT= '44d3c576-ca26-11e3-a90a-a0369f38458a:7390'/*!*/;

这个问题比较坑,看了下文档,mysql 在row format的情况下,对于temporary tab的处理情况如下:
1、不进行tempory table 复制(这个应该是指创建)
2、从上面的binlog中也可以看到,对于temporary tab 的dml操作,在复制的时候,用空事务代替。

相应的文档:
RBL, RBR, and temporary tables. As noted in Section 16.4.1.22, “Replication and Temporary
Tables”, temporary tables are not replicated when using row-based format. When mixed format is in
effect, “safe” statements involving temporary tables are logged using statement-based format. For more
information, see Section 16.1.2.1, “Advantages and Disadvantages of Statement-Based and Row-Based
Replication”.

MySQL文档:17.1.3.4 Restrictions on Replication with GTIDs
Temporary tables.  CREATE TEMPORARY TABLE and DROP TEMPORARY TABLE statements are not supported inside transactions when using GTIDs (that is, when the server was started with the --enforce-gtid-consistency option). It is possible to use these statements with GTIDs enabled, but only outside of any transaction, and only with autocommit=1.

注意:reset slave操作没有对gtid有任何影响,因此不能通过这种办法来让主从GTID保持一致:
mysql> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
               < 省略部分信息 >
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 73d1e54a-c3a5-11e3-bea2-005056b4006e:3-5
            Executed_Gtid_Set: 73d1e54a-c3a5-11e3-bea2-005056b4006e:1-5,a95348e2-e7aa-11e2-a42f-001b785aa468:1-5
                Auto_Position: 1
1 row in set (0.00 sec)


mysql> 
mysql> 
mysql> 
mysql> 
mysql> show global variables like '%gtid%';
+---------------------------------+------------------------------------------------------------------------------------+
| Variable_name                   | Value                                                                              |
+---------------------------------+------------------------------------------------------------------------------------+
| enforce_gtid_consistency        | ON                                                                                 |
| gtid_executed                   | 73d1e54a-c3a5-11e3-bea2-005056b4006e:1-5, a95348e2-e7aa-11e2-a42f-001b785aa468:1-5 |
| gtid_mode                       | ON                                                                                 |
| gtid_owned                      |                                                                                    |
| gtid_purged                     |                                                                                    |
| simplified_binlog_gtid_recovery | OFF                                                                                |
+---------------------------------+------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)

mysql> select * from test.aa;
+------+------+
| id   | name |
+------+------+
|    1 | a    |
|    2 | a    |
|    1 | NULL |
|    2 | NULL |
|    3 | NULL |
+------+------+
5 rows in set (0.00 sec)

mysql> stop slave;
Query OK, 0 rows affected (0.06 sec)

mysql> reset slave;
Query OK, 0 rows affected (0.11 sec)

mysql> show global variables like '%gtid%';
+---------------------------------+------------------------------------------------------------------------------------+
| Variable_name                   | Value                                                                              |
+---------------------------------+------------------------------------------------------------------------------------+
| enforce_gtid_consistency        | ON                                                                                 |
| gtid_executed                   | 73d1e54a-c3a5-11e3-bea2-005056b4006e:1-5, a95348e2-e7aa-11e2-a42f-001b785aa468:1-5 |
| gtid_mode                       | ON                                                                                 |
| gtid_owned                      |                                                                                    |
| gtid_purged                     |                                                                                    |
| simplified_binlog_gtid_recovery | OFF                                                                                |
+---------------------------------+------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)








来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22818880/viewspace-2073787/,如需转载,请注明出处,否则将追究法律责任。

上一篇: pt-query-digest工具
请登录后发表评论 登录
全部评论

注册时间:2009-11-03

  • 博文量
    39
  • 访问量
    62695