db2部分表空间变为rollforward pending原因分析
问题描述:DB2数据库,有部分表空间状态是rollforward pending,其他表空间状态都正常,希望找到原因。原因分析:所谓rollforward pending,意思是这个表空间需要做rollforward操作才能恢复正常。原因是之前这个表空间做rollforward的时候出错了,出错的原因很可能是容器不可访问(也不排除其他原因)。在以下情况下,表空间会做rollforward操...
问题描述:
DB2数据库,有部分表空间状态是rollforward pending,其他表空间状态都正常,希望找到原因。
原因分析:
所谓rollforward pending,意思是这个表空间需要做rollforward操作才能恢复正常。原因是之前这个表空间做rollforward的时候出错了,出错的原因很可能是容器不可访问(也不排除其他原因)。在以下情况下,表空间会做rollforward操作:
1. HADR的备机,因为需要重做主机的日志,所以一直在做rollforward操作
2. Crash Recovery 时,需要重做日志,也会有rollforward操作
3. 恢复数据库后,需要前滚,显示地发出rollforward命令。
因此,对应的原因主要有以下3个:
1. HADR备机上,某个表空间的容器出现了问题,不可访问,则切换之后,该表空间可能被置于rollforward pending。
2. Crash Recovery 时,某个表空间的容器出现了问题,不可访问,crash recovery完成之后,该表空间会被置于rollforward pending状态。
3. 显式地发出rollforward命令时,某个表空间的容器出现了问题,不可访问,则rollforward完成后,该表空间会被置于rollforward pending状态。
解决方法:
rollforward pending状态不会因为容器可以访问了就能解除的,需要显式地发出rollforward命令
db2 "rollforward db xx to end of logs and complete tablespace(xxx,yyy) online"
问题模拟:
针对以上原因分析,设计了两个实验。 第一个实验是HADR场景,第二个实验是Crash Recovery场景,两种场景下,数据库归采用归档日志模式。
场景1 HADR备机容器不可访问导致切换后表空间进入rollforward pending状态。
主机,创建表空间和表
$ db2 "create tablespace tbs1 managed by database using(file 'file1' 1000)"
$ db2 "create table file1Table1(id int) in tbs1"
备机,将对应容器权限修改为000,并重启实例(释放打开的文件描述符,做到真正的不可访问):
$ ls -l /db2data/sample/db2tst/NODE0000/SQL00001/file1
-rw------- 1 db2tst db2adm 32768000 Jan 10 17:50 /db2data/sample/db2tst/NODE0000/SQL00001/file1
$ chmod 000 /db2data/sample/db2tst/NODE0000/SQL00001/file1
$ ls -l /db2data/sample/db2tst/NODE0000/SQL00001/file1
---------- 1 db2tst db2adm 32768000 Jan 10 17:52 /db2data/sample/db2tst/NODE0000/SQL00001/file1
$ db2stop force
$ db2start
$ db2 "activate db sample"
主机,进行一个事务:
$ db2 "insert into file1Table1 values(1)"
DB20000I The SQL command completed successfully.
备机,发出takeover命令,变成主机,这时候发现表空间状态为Roll forward pending+Offline,将容器恢复正常,且将表空间switch online之后,表空间Roll forward pending的状态不会改变,除非发了rollforward database命令。
$ db2 "takeover hadr on db sample"
DB20000I The TAKEOVER HADR ON DATABASE command completed successfully.
$ db2 "connect to sample"
$ db2 "list tablespaces"
..
Tablespace ID = 7
Name = TBS1
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x4080
Detailed explanation:
Roll forward pending
Offline
$ chmod 600 /db2data/sample/db2tst/NODE0000/SQL00001/file1
$ db2 "alter tablespace TBS1 switch online"
$ db2 "list tablespaces"
..
Tablespace ID = 7
Name = TBS1
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x0080
Detailed explanation:
Roll forward pending
$ db2 "rollforward db sample to end of logs and complete tablespace(tbs1) online"
SQL1774N Table space restore or rollforward cannot be issued on an HADR
primary or HADR standby database.
$ db2 "stop hadr on db sample"
$ db2 "rollforward db sample to end of logs and complete tablespace(tbs1) online"
$ db2 "start hadr on db sample as primary"
$ db2 "connect to sample"
$ db2 list tablespaces | tail -n 8
Tablespace ID = 7
Name = TBS1
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x0000
Detailed explanation:
Normal
场景2 Crash Recovery 时,容器不可访问导致重启后表空间进入rollforward pending状态。
模拟一次crash
db2tst2@DB2SERVER:~> db2 "create tablespace tablespace1 managed by database using (file 'file1' 10000)"
DB20000I The SQL command completed successfully.
db2tst2@DB2SERVER:~> db2 "create table t1oftbs1(id int) in tablespace1"
DB20000I The SQL command completed successfully.
db2tst2@DB2SERVER:~> db2pd -db sample -tab | tail -n 1
0x00007FA1302ACD00 12 0 File 10000 9952 - 0 /db2/restore/sample/db2tst2/NODE0000/SQL00001/file1
db2tst2@DB2SERVER:~> ls -l /db2/restore/sample/db2tst2/NODE0000/SQL00001/file1
-rw------- 1 db2tst2 db2adm 81920000 Jan 10 17:22 /db2/restore/sample/db2tst2/NODE0000/SQL00001/file1
db2tst2@DB2SERVER:~> db2 +c "insert into t1oftbs1 select id from t1"
DB20000I The SQL command completed successfully.
db2tst2@DB2SERVER:~> db2_kill
Application ipclean: Removing DB2 engine and client IPC resources for db2tst2
db2tst2@DB2SERVER:~> ipclean
Application ipclean: Removing DB2 engine and client IPC resources for db2tst2
模拟crash recovery时,该表空间容器不可访问,连库之后发现表空间状态为Roll forward pending+Offline:
db2tst2@DB2SERVER:~> chmod 000 /db2/restore/sample/db2tst2/NODE0000/SQL00001/file1
db2tst2@DB2SERVER:~> db2start
01/10/2020 17:13:43 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
db2tst2@DB2SERVER:~> db2 connect to sample
Database Connection Information
Database server = DB2/LINUXX8664 10.5.9
SQL authorization ID = DB2TST2
Local database alias = SAMPLE
db2tst2@DB2SERVER:~> db2 "list tablespaces" | tail -n 8
Name = TABLESPACE1
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x4080
Detailed explanation:
Roll forward pending
Offline
采用和实验1同样的方法,恢复正常
db2tst2@DB2SERVER:~> chmod 600 /db2/restore/sample/db2tst2/NODE0000/SQL00001/file1
db2tst2@DB2SERVER:~> db2 "alter tablespace TABLESPACE1 switch online"
DB20000I The SQL command completed successfully.
db2tst2@DB2SERVER:~> db2 "list tablespaces" | tail -n 8
Tablespace ID = 12
Name = TABLESPACE1
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x0080
Detailed explanation:
Roll forward pending
db2tst2@DB2SERVER:~> db2 "rollforward db sample to end of logs and complete tablespace(TABLESPACE1) online"
Rollforward Status
Input database alias = sample
Number of members have returned status = 1
Member ID = 0
Rollforward status = not pending
Next log file to be read =
Log files processed = -
Last committed transaction = 2020-01-10-09.22.50.000000 UTC
DB20000I The ROLLFORWARD command completed successfully.
db2tst2@DB2SERVER:~> db2 "connect to sample"
Database Connection Information
Database server = DB2/LINUXX8664 10.5.9
SQL authorization ID = DB2TST2
Local database alias = SAMPLE
db2tst2@DB2SERVER:~> db2 "list tablespaces" | tail -n 8
Tablespace ID = 12
Name = TABLESPACE1
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x0000
Detailed explanation:
Normal
更多推荐
所有评论(0)