ITPub博客

首页 > 数据库 > PostgreSQL > PostgreSQL DBA(30) - Backup&Recovery#3(数据文件损坏恢复)

PostgreSQL DBA(30) - Backup&Recovery#3(数据文件损坏恢复)

原创 PostgreSQL 作者:husthxd 时间:2019-03-12 17:14:44 0 删除 编辑

在数据文件损坏,数据库无法正常启动的时候,Oracle可通过数据库备份+归档日志+在线日志实现数据库的完整恢复,与之类似,PostgreSQL也可以通过数据库备份+归档WAL日志+在线WAL日志进行完整恢复.

一、场景

1.执行备份
2.构造测试数据
3.删除数据文件(保留WAL日志文件)
4.使用备份+在线WAL日志文件进行完整恢复

二、模拟完整恢复

参数配置


archive_mode = on 
archive_command = '/home/xdb/archive.sh'
wal_level = replica
max_wal_size = 4GB
min_wal_size = 1024MB

详细请参考 Backup&Recovery#1(基本操作)

执行备份
查看当前的LSN


testdb=# select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/39A63C78
(1 row)

使用工具pg_basebackup对数据库进行备份


testdb=# \q
[xdb@localhost testdb]$ pg_basebackup -D /data/backup/0312-1/ -l 0312-1 -v -F tar -z
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/3A000108 on timeline 15
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_2978"
pg_basebackup: write-ahead log end point: 0/3A0001D8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
[xdb@localhost testdb]$ 
[xdb@localhost testdb]$ ll /data/backup/0312-1/
total 44384
-rw-------. 1 xdb xdb 45427619 Mar 12 17:30 base.tar.gz
-rw-------. 1 xdb xdb    18927 Mar 12 17:30 pg_wal.tar.gz
[xdb@localhost testdb]$ 
####
[xdb@localhost ~]$ psql -d testdb
psql (11.2)
Type "help" for help.
testdb=# checkpoint;
CHECKPOINT
testdb=#

构造测试数据
创建数据表,插入数据


testdb=# create table tbl01(id int,c1 char(200),c2 char(200));
CREATE TABLE
testdb=# insert into tbl01 select f,f||'c1',f||'c2' from generate_series(1,100000) f;
INSERT 0 100000
testdb=# select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/3DD39618
(1 row)
testdb=# create table tbl02(id int,c1 char(200),c2 char(200));
CREATE TABLE
testdb=# insert into tbl02 select f,f||'c1',f||'c2' from generate_series(1,100000) f;
INSERT 0 100000
testdb=# select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/40A62F20
(1 row)
testdb=#

当前在线日志文件为0000000F0000000000000040


[xdb@localhost testdb]$ ll $PGDATA/pg_wal
total 196632
-rw-------. 1 xdb xdb       42 Mar 12 17:10 00000008.history
-rw-------. 1 xdb xdb       85 Mar 12 17:10 0000000C.history
-rw-------. 1 xdb xdb 16777216 Mar 12 17:10 0000000E0000000000000039.partial
-rw-------. 1 xdb xdb      129 Mar 12 17:10 0000000E.history
-rw-------. 1 xdb xdb 16777216 Mar 12 17:30 0000000F000000000000003A
-rw-------. 1 xdb xdb      323 Mar 12 17:30 0000000F000000000000003A.00000108.backup
-rw-------. 1 xdb xdb 16777216 Mar 12 17:32 0000000F000000000000003B
-rw-------. 1 xdb xdb 16777216 Mar 12 17:32 0000000F000000000000003C
-rw-------. 1 xdb xdb 16777216 Mar 12 17:32 0000000F000000000000003D
-rw-------. 1 xdb xdb 16777216 Mar 12 17:32 0000000F000000000000003E
-rw-------. 1 xdb xdb 16777216 Mar 12 17:32 0000000F000000000000003F
-rw-------. 1 xdb xdb 16777216 Mar 12 17:32 0000000F0000000000000040
-rw-------. 1 xdb xdb 16777216 Mar 12 17:10 0000000F0000000000000041
-rw-------. 1 xdb xdb 16777216 Mar 12 17:10 0000000F0000000000000042
-rw-------. 1 xdb xdb 16777216 Mar 12 17:10 0000000F0000000000000043
-rw-------. 1 xdb xdb 16777216 Mar 12 17:30 0000000F0000000000000044
-rw-------. 1 xdb xdb      173 Mar 12 17:11 0000000F.history
drwx------. 2 xdb xdb     4096 Mar 12 17:32 archive_status
[xdb@localhost testdb]$

归档日志文件信息


[xdb@localhost testdb]$ ll /data/archivelog/20190312/
total 245772
-rw-------. 1 xdb xdb 16777216 Mar 12 17:06 0000000E0000000000000032
-rw-------. 1 xdb xdb 16777216 Mar 12 17:06 0000000E0000000000000033
-rw-------. 1 xdb xdb      323 Mar 12 17:06 0000000E0000000000000033.00000028.backup
-rw-------. 1 xdb xdb 16777216 Mar 12 17:07 0000000E0000000000000034
-rw-------. 1 xdb xdb 16777216 Mar 12 17:07 0000000E0000000000000035
-rw-------. 1 xdb xdb 16777216 Mar 12 17:07 0000000E0000000000000036
-rw-------. 1 xdb xdb 16777216 Mar 12 17:07 0000000E0000000000000037
-rw-------. 1 xdb xdb 16777216 Mar 12 17:07 0000000E0000000000000038
-rw-------. 1 xdb xdb 16777216 Mar 12 17:11 0000000E0000000000000039.partial
-rw-------. 1 xdb xdb 16777216 Mar 12 17:30 0000000F0000000000000039
-rw-------. 1 xdb xdb 16777216 Mar 12 17:30 0000000F000000000000003A
-rw-------. 1 xdb xdb      323 Mar 12 17:30 0000000F000000000000003A.00000108.backup
-rw-------. 1 xdb xdb 16777216 Mar 12 17:32 0000000F000000000000003B
-rw-------. 1 xdb xdb 16777216 Mar 12 17:32 0000000F000000000000003C
-rw-------. 1 xdb xdb 16777216 Mar 12 17:32 0000000F000000000000003D
-rw-------. 1 xdb xdb 16777216 Mar 12 17:32 0000000F000000000000003E
-rw-------. 1 xdb xdb 16777216 Mar 12 17:32 0000000F000000000000003F
-rw-------. 1 xdb xdb      173 Mar 12 17:11 0000000F.history
[xdb@localhost testdb]$

模拟数据文件损坏
删除数据文件目录,kill postgres进程


[xdb@localhost testdb]$ rm -rf ./base
[xdb@localhost testdb]$ psql -d testdb
psql: FATAL:  database "testdb" does not exist
DETAIL:  The database subdirectory "base/16384" is missing.
[xdb@localhost testdb]$ ps -ef|grep postgres
xdb       2914     1  0 17:11 pts/2    00:00:00 /appdb/xdb/pg11.2/bin/postgres
xdb       2915  2914  0 17:11 ?        00:00:00 postgres: logger   
xdb       2918  2914  0 17:11 ?        00:00:00 postgres: checkpointer   
xdb       2919  2914  0 17:11 ?        00:00:00 postgres: background writer   
xdb       2921  2914  0 17:11 ?        00:00:00 postgres: stats collector   
xdb       2925  2914  0 17:11 ?        00:00:00 postgres: walwriter   
xdb       2926  2914  0 17:11 ?        00:00:00 postgres: autovacuum launcher  
xdb       2927  2914  0 17:11 ?        00:00:00 postgres: archiver   last was 0000000F000000000000003F
xdb       2928  2914  0 17:11 ?        00:00:00 postgres: logical replication launcher  
xdb       2977  2914  0 17:30 ?        00:00:00 postgres: xdb testdb [local] idle
xdb       3014  2519  0 17:33 pts/2    00:00:00 grep --color=auto postgres
[xdb@localhost testdb]$ kill -9 2914
[xdb@localhost testdb]$ ps -ef|grep postgres
xdb       3016  2519  0 17:34 pts/2    00:00:00 grep --color=auto postgres

执行恢复
备份在线日志


[xdb@localhost ~]$ mkdir /data/backup/wal
[xdb@localhost testdb]$ cp -R ./pg_wal/* /data/backup/wal/
[xdb@localhost testdb]$ 
[xdb@localhost testdb]$  ll /data/backup/wal/
total 196632
-rw-------. 1 xdb xdb       42 Mar 12 17:34 00000008.history
-rw-------. 1 xdb xdb       85 Mar 12 17:34 0000000C.history
-rw-------. 1 xdb xdb 16777216 Mar 12 17:34 0000000E0000000000000039.partial
-rw-------. 1 xdb xdb      129 Mar 12 17:34 0000000E.history
-rw-------. 1 xdb xdb 16777216 Mar 12 17:34 0000000F000000000000003A
-rw-------. 1 xdb xdb      323 Mar 12 17:34 0000000F000000000000003A.00000108.backup
-rw-------. 1 xdb xdb 16777216 Mar 12 17:34 0000000F000000000000003B
-rw-------. 1 xdb xdb 16777216 Mar 12 17:34 0000000F000000000000003C
-rw-------. 1 xdb xdb 16777216 Mar 12 17:34 0000000F000000000000003D
-rw-------. 1 xdb xdb 16777216 Mar 12 17:34 0000000F000000000000003E
-rw-------. 1 xdb xdb 16777216 Mar 12 17:34 0000000F000000000000003F
-rw-------. 1 xdb xdb 16777216 Mar 12 17:34 0000000F0000000000000040
-rw-------. 1 xdb xdb 16777216 Mar 12 17:34 0000000F0000000000000041
-rw-------. 1 xdb xdb 16777216 Mar 12 17:34 0000000F0000000000000042
-rw-------. 1 xdb xdb 16777216 Mar 12 17:34 0000000F0000000000000043
-rw-------. 1 xdb xdb 16777216 Mar 12 17:34 0000000F0000000000000044
-rw-------. 1 xdb xdb      173 Mar 12 17:34 0000000F.history
drwx------. 2 xdb xdb     4096 Mar 12 17:34 archive_status
[xdb@localhost testdb]$

从数据库备份中恢复


[xdb@localhost testdb]$ rm -rf *
[xdb@localhost testdb]$ cp /data/backup/0312-1/base.tar.gz ./
[xdb@localhost testdb]$ tar zxf base.tar.gz

恢复在线日志


cp -R /data/backup/wal/0000000F0000000000000040 ./pg_wal
[xdb@localhost testdb]$ ll ./pg_wal
total 16384
-rw-------. 1 xdb xdb 16777216 Mar 12 17:35 0000000F0000000000000040
drwx------. 2 xdb xdb        6 Mar 12 17:30 archive_status

创建recovery.conf文件


[xdb@localhost testdb]$ vim recovery.conf
[xdb@localhost testdb]$ cat recovery.conf 
#Recovery
restore_command='cp /data/archivelog/20190312/%f %p'
#restore_target=XX

执行恢复,启动数据库


[xdb@localhost testdb]$ pg_ctl start
waiting for server to start....2019-03-12 17:36:21.310 CST [3030] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-03-12 17:36:21.310 CST [3030] LOG:  listening on IPv6 address "::", port 5432
2019-03-12 17:36:21.328 CST [3030] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-03-12 17:36:21.341 CST [3030] LOG:  redirecting log output to logging collector process
2019-03-12 17:36:21.341 CST [3030] HINT:  Future log output will appear in directory "pg_log".
 done
server started

日志输出


2019-03-12 17:36:21.341 CST,,,3030,,5c877d95.bd6,1,,2019-03-12 17:36:21 CST,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2019-03-12 17:36:21.350 CST,,,3032,,5c877d95.bd8,1,,2019-03-12 17:36:21 CST,,0,LOG,00000,"database system was interrupted; last known up at 2019-03-12 17:30:44 CST",,,,,,,,,""
2019-03-12 17:36:21.358 CST,,,3032,,5c877d95.bd8,2,,2019-03-12 17:36:21 CST,,0,LOG,00000,"starting archive recovery",,,,,,,,,""
2019-03-12 17:36:21.372 CST,,,3032,,5c877d95.bd8,3,,2019-03-12 17:36:21 CST,,0,LOG,00000,"restored log file ""0000000F.history"" from archive",,,,,,,,,""
2019-03-12 17:36:21.486 CST,,,3032,,5c877d95.bd8,4,,2019-03-12 17:36:21 CST,,0,LOG,00000,"restored log file ""0000000F000000000000003A"" from archive",,,,,,,,,""
2019-03-12 17:36:21.693 CST,,,3032,,5c877d95.bd8,5,,2019-03-12 17:36:21 CST,1/0,0,LOG,00000,"redo starts at 0/3A000108",,,,,,,,,""
2019-03-12 17:36:21.696 CST,,,3032,,5c877d95.bd8,6,,2019-03-12 17:36:21 CST,1/0,0,LOG,00000,"consistent recovery state reached at 0/3A0001D8",,,,,,,,,""
2019-03-12 17:36:21.696 CST,,,3030,,5c877d95.bd6,2,,2019-03-12 17:36:21 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,,""
2019-03-12 17:36:21.826 CST,,,3032,,5c877d95.bd8,7,,2019-03-12 17:36:21 CST,1/0,0,LOG,00000,"restored log file ""0000000F000000000000003B"" from archive",,,,,,,,,""
2019-03-12 17:36:22.245 CST,,,3032,,5c877d95.bd8,8,,2019-03-12 17:36:21 CST,1/0,0,LOG,00000,"restored log file ""0000000F000000000000003C"" from archive",,,,,,,,,""
2019-03-12 17:36:22.614 CST,,,3032,,5c877d95.bd8,9,,2019-03-12 17:36:21 CST,1/0,0,LOG,00000,"restored log file ""0000000F000000000000003D"" from archive",,,,,,,,,""
2019-03-12 17:36:23.039 CST,,,3032,,5c877d95.bd8,10,,2019-03-12 17:36:21 CST,1/0,0,LOG,00000,"restored log file ""0000000F000000000000003E"" from archive",,,,,,,,,""
2019-03-12 17:36:23.342 CST,,,3032,,5c877d95.bd8,11,,2019-03-12 17:36:21 CST,1/0,0,LOG,00000,"restored log file ""0000000F000000000000003F"" from archive",,,,,,,,,""
2019-03-12 17:36:23.874 CST,,,3032,,5c877d95.bd8,12,,2019-03-12 17:36:21 CST,1/0,0,LOG,00000,"invalid record length at 0/40A63B08: wanted 24, got 0",,,,,,,,,""
2019-03-12 17:36:23.874 CST,,,3032,,5c877d95.bd8,13,,2019-03-12 17:36:21 CST,1/0,0,LOG,00000,"redo done at 0/40A63AD0",,,,,,,,,""
2019-03-12 17:36:23.874 CST,,,3032,,5c877d95.bd8,14,,2019-03-12 17:36:21 CST,1/0,0,LOG,00000,"last completed transaction was at log time 2019-03-12 17:33:32.760492+08",,,,,,,,,""
2019-03-12 17:36:23.879 CST,,,3032,,5c877d95.bd8,15,,2019-03-12 17:36:21 CST,1/0,0,LOG,00000,"selected new timeline ID: 16",,,,,,,,,""
2019-03-12 17:36:24.773 CST,,,3032,,5c877d95.bd8,16,,2019-03-12 17:36:21 CST,1/0,0,LOG,00000,"archive recovery complete",,,,,,,,,""
2019-03-12 17:36:24.777 CST,,,3032,,5c877d95.bd8,17,,2019-03-12 17:36:21 CST,1/0,0,LOG,00000,"restored log file ""0000000F.history"" from archive",,,,,,,,,""
2019-03-12 17:36:25.589 CST,,,3030,,5c877d95.bd6,3,,2019-03-12 17:36:21 CST,,0,LOG,00000,"database system is ready to accept connections",,,,,,,,,""

验证数据


[xdb@localhost testdb]$ psql -d testdb
psql (11.2)
Type "help" for help.
testdb=# 
testdb=# select count(*) from tbl01;
 count  
--------
 100000
(1 row)
testdb=# select count(*) from t02;
 count  
--------
 100000
(1 row)
testdb=#

时间线历史文件,在归档目录和pg_wal目录下均存在以当前时间线命名的history,该文件说明了该Cluster的历史.


[xdb@localhost ~]$ cat /data/archivelog/20190312/00000010.history
7    0/27000000    no recovery target specified
8    0/2A0012E8    no recovery target specified
12    0/32000000    no recovery target specified
14    0/39A63BD0    no recovery target specified
15    0/40A63B08    no recovery target specified
[xdb@localhost ~]$ cat $PGDATA/pg_wal/00000010.history 
7    0/27000000    no recovery target specified
8    0/2A0012E8    no recovery target specified
12    0/32000000    no recovery target specified
14    0/39A63BD0    no recovery target specified
15    0/40A63B08    no recovery target specified
[xdb@localhost ~]$

DONE!

三、参考资料

Base Backup & Point-in-Time Recovery
Write Ahead Log
pgbasebackup
PostgreSQL DBA(28) - Backup&Recovery#1(基本操作)

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/6906/viewspace-2638204/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
长期从事政务、金融等行业产品研发和架构设计工作,对Oracle、PostgreSQL以及大数据等相关技术有深入研究。现就职于广州云图数据技术有限公司,系统架构师。

注册时间:2007-12-28

  • 博文量
    1196
  • 访问量
    3665049