db2部分表空间变为rollforward pending原因分析

问题描述：DB2数据库，有部分表空间状态是rollforward pending，其他表空间状态都正常，希望找到原因。原因分析：所谓rollforward pending，意思是这个表空间需要做rollforward操作才能恢复正常。原因是之前这个表空间做rollforward的时候出错了，出错的原因很可能是容器不可访问(也不排除其他原因)。在以下情况下，表空间会做rollforward操...

匿_名_用_户

4041人浏览 · 2018-12-26 20:56:10

匿_名_用_户 · 2018-12-26 20:56:10 发布

问题描述：
DB2数据库，有部分表空间状态是rollforward pending，其他表空间状态都正常，希望找到原因。

原因分析：
所谓rollforward pending，意思是这个表空间需要做rollforward操作才能恢复正常。原因是之前这个表空间做rollforward的时候出错了，出错的原因很可能是容器不可访问(也不排除其他原因)。在以下情况下，表空间会做rollforward操作：
1. HADR的备机，因为需要重做主机的日志，所以一直在做rollforward操作
2. Crash Recovery 时，需要重做日志，也会有rollforward操作
3. 恢复数据库后，需要前滚，显示地发出rollforward命令。

因此，对应的原因主要有以下3个：
1. HADR备机上，某个表空间的容器出现了问题，不可访问，则切换之后，该表空间可能被置于rollforward pending。
2. Crash Recovery 时，某个表空间的容器出现了问题，不可访问，crash recovery完成之后，该表空间会被置于rollforward pending状态。
3. 显式地发出rollforward命令时，某个表空间的容器出现了问题，不可访问，则rollforward完成后，该表空间会被置于rollforward pending状态。

解决方法：
rollforward pending状态不会因为容器可以访问了就能解除的，需要显式地发出rollforward命令
db2 "rollforward db xx to end of logs and complete tablespace(xxx,yyy) online"

问题模拟：
针对以上原因分析，设计了两个实验。第一个实验是HADR场景，第二个实验是Crash Recovery场景，两种场景下，数据库归采用归档日志模式。

场景1 HADR备机容器不可访问导致切换后表空间进入rollforward pending状态。

主机，创建表空间和表
$ db2 "create tablespace tbs1 managed by database using(file 'file1' 1000)"
$ db2 "create table file1Table1(id int) in tbs1"

备机，将对应容器权限修改为000，并重启实例(释放打开的文件描述符，做到真正的不可访问)：
$ ls -l /db2data/sample/db2tst/NODE0000/SQL00001/file1
-rw------- 1 db2tst db2adm 32768000 Jan 10 17:50 /db2data/sample/db2tst/NODE0000/SQL00001/file1
$ chmod 000 /db2data/sample/db2tst/NODE0000/SQL00001/file1
$ ls -l /db2data/sample/db2tst/NODE0000/SQL00001/file1
---------- 1 db2tst db2adm 32768000 Jan 10 17:52 /db2data/sample/db2tst/NODE0000/SQL00001/file1
$ db2stop force
$ db2start
$ db2 "activate db sample"

主机，进行一个事务：
$ db2 "insert into file1Table1 values(1)"
DB20000I The SQL command completed successfully.

备机，发出takeover命令，变成主机，这时候发现表空间状态为Roll forward pending+Offline，将容器恢复正常，且将表空间switch online之后，表空间Roll forward pending的状态不会改变，除非发了rollforward database命令。
$ db2 "takeover hadr on db sample"
DB20000I The TAKEOVER HADR ON DATABASE command completed successfully.
$ db2 "connect to sample"
$ db2 "list tablespaces"
..

Tablespace ID = 7
Name = TBS1
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x4080
Detailed explanation:
Roll forward pending
Offline

$ chmod 600 /db2data/sample/db2tst/NODE0000/SQL00001/file1
$ db2 "alter tablespace TBS1 switch online"
$ db2 "list tablespaces"
..
Tablespace ID = 7
Name = TBS1
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x0080
Detailed explanation:
Roll forward pending

$ db2 "rollforward db sample to end of logs and complete tablespace(tbs1) online"
SQL1774N Table space restore or rollforward cannot be issued on an HADR
primary or HADR standby database.
$ db2 "stop hadr on db sample"
$ db2 "rollforward db sample to end of logs and complete tablespace(tbs1) online"
$ db2 "start hadr on db sample as primary"
$ db2 "connect to sample"
$ db2 list tablespaces | tail -n 8
Tablespace ID = 7
Name = TBS1
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x0000
Detailed explanation:
Normal

场景2 Crash Recovery 时，容器不可访问导致重启后表空间进入rollforward pending状态。

模拟一次crash
db2tst2@DB2SERVER:~> db2 "create tablespace tablespace1 managed by database using (file 'file1' 10000)"
DB20000I The SQL command completed successfully.
db2tst2@DB2SERVER:~> db2 "create table t1oftbs1(id int) in tablespace1"
DB20000I The SQL command completed successfully.
db2tst2@DB2SERVER:~> db2pd -db sample -tab | tail -n 1
0x00007FA1302ACD00 12 0 File 10000 9952 - 0 /db2/restore/sample/db2tst2/NODE0000/SQL00001/file1
db2tst2@DB2SERVER:~> ls -l /db2/restore/sample/db2tst2/NODE0000/SQL00001/file1
-rw------- 1 db2tst2 db2adm 81920000 Jan 10 17:22 /db2/restore/sample/db2tst2/NODE0000/SQL00001/file1
db2tst2@DB2SERVER:~> db2 +c "insert into t1oftbs1 select id from t1"
DB20000I The SQL command completed successfully.
db2tst2@DB2SERVER:~> db2_kill
Application ipclean: Removing DB2 engine and client IPC resources for db2tst2
db2tst2@DB2SERVER:~> ipclean
Application ipclean: Removing DB2 engine and client IPC resources for db2tst2

模拟crash recovery时，该表空间容器不可访问，连库之后发现表空间状态为Roll forward pending+Offline：
db2tst2@DB2SERVER:~> chmod 000 /db2/restore/sample/db2tst2/NODE0000/SQL00001/file1
db2tst2@DB2SERVER:~> db2start
01/10/2020 17:13:43 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
db2tst2@DB2SERVER:~> db2 connect to sample

Database Connection Information

Database server = DB2/LINUXX8664 10.5.9
SQL authorization ID = DB2TST2
Local database alias = SAMPLE

db2tst2@DB2SERVER:~> db2 "list tablespaces" | tail -n 8
Name = TABLESPACE1
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x4080
Detailed explanation:
Roll forward pending
Offline

采用和实验1同样的方法，恢复正常
db2tst2@DB2SERVER:~> chmod 600 /db2/restore/sample/db2tst2/NODE0000/SQL00001/file1
db2tst2@DB2SERVER:~> db2 "alter tablespace TABLESPACE1 switch online"
DB20000I The SQL command completed successfully.
db2tst2@DB2SERVER:~> db2 "list tablespaces" | tail -n 8
Tablespace ID = 12
Name = TABLESPACE1
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x0080
Detailed explanation:
Roll forward pending
db2tst2@DB2SERVER:~> db2 "rollforward db sample to end of logs and complete tablespace(TABLESPACE1) online"

Rollforward Status

Input database alias = sample
Number of members have returned status = 1

Member ID = 0
Rollforward status = not pending
Next log file to be read =
Log files processed = -
Last committed transaction = 2020-01-10-09.22.50.000000 UTC

DB20000I The ROLLFORWARD command completed successfully.
db2tst2@DB2SERVER:~> db2 "connect to sample"

Database Connection Information

Database server = DB2/LINUXX8664 10.5.9
SQL authorization ID = DB2TST2
Local database alias = SAMPLE

db2tst2@DB2SERVER:~> db2 "list tablespaces" | tail -n 8
Tablespace ID = 12
Name = TABLESPACE1
Type = Database managed space
Contents = All permanent data. Large table space.
State = 0x0000
Detailed explanation:
Normal

向您推荐>>Eolink开发者社区

权威｜前沿｜技术｜干货｜国内首个API全生命周期开发者社区

更多推荐

深入理解 Mocha 测试框架：从零实现一个 Mocha

前言什么是自动化测试自动化测试在很多团队中都是Devops环节中很难执行起来的一个环节，主要原因在于测试代码的编写工作很难抽象，99%的场景都需要和业务强绑定，而且写测试代码的编写工作量往往比编写实际业务代码的工作量更多。在一些很多业务场景中投入产出比很低，适合写自动化测试的应该是那些中长期业务以及一些诸如组件一样的基础库。自动化测试是个比较大的概念，其中分类也比较多，比如单元测试，端对端测试，集

云原生

ELK实现containerd的容器日志采集展示【基于logging的全栈监测】

企业级ELK Stack构建介绍

云原生

(20200916 Solved)docker-compose up创建容器自动退出

问题描述如题，创建容器后自动退出了。并且docker start container无效解决方案原因是缺失了控制终端的配置，需要在docker-compose.yml中增加tty:true ，有时候这样也不行，需要再增加一个command:/bin/bash，命令不一定是这个，需要是一个不会退出的命令，然后用-d后台启动容器。Referencesdocker-compose启动容器后自动退出...