ITPub博客

首页 > 应用开发 > IT综合 > DB意外宕机一则

DB意外宕机一则

原创 IT综合 作者:aspen1982 时间:2011-07-19 15:45:38 0 删除 编辑

今天用户上报DB不能登录的问题,顺藤摸瓜解决DB意外宕机问题。

关键字:Instance terminate ora-07445 IO orapw

[@more@]

今天开发的同事上报一个开发测试库登录不了,报没有监听的错误。

处理过程

吸取上次教训(《library cache lock解决一则》中有述)先登录主机验证监听是否开启

$>lsnrctl status sid

果然是没开,上报信息无误。

开启监听

$>lsnrctl start sid

顺便用sys登录DB看了一下DB状态,结果是ORACLE NOT AVAILABLE。原来DB都停了,startup起来。从本机远程登录验证远程登录服务是否恢复,服务恢复。

用户的问题到这里就算是解决了。不过问题如果就这些内容那我自己都觉得写这篇blog没啥意思了。

这时一位同事说了一句:这个库最近怎么老是自己就停了?这一句话让我觉得其中有问题值得研究,我决定查查是什么原因导致这个库“自己就停了”。虽然说只是开发测试环境,但我觉得也不应该这么不稳定。

查看alert_sid.log

发现以下错误导致DB在临晨terminate

**************************************************************

Tue Jul 19 02:51:57 BEIST 2011

Errors in file /oracle/admin/ccicvsmk/bdump/ccicvsmk_ckpt_393436.trc:

ORA-00206: Message 206 not found; product=RDBMS; facility=ORA

; arguments: [3] [1]

ORA-00202: Message 202 not found; product=RDBMS; facility=ORA

; arguments: [/oracle/oradata/ccicvsmk/control03.ctl]

ORA-27072: Message 27072 not found; product=RDBMS; facility=ORA

IBM AIX RISC System/6000 Error: 5: I/O error

Additional information: 10

Additional information: 3

……

Tue Jul 19 02:52:55 BEIST 2011

Errors in file /oracle/admin/ccicvsmk/bdump/ccicvsmk_q002_1167456.trc:

ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA

; arguments: [000114e0] [SIGBUS] [unknown code] [0x1000114E0]

ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA

; arguments: [000114e0] [SIGBUS] [unknown code] [0x1000114E0]

ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA

; arguments: [ksihsmratt+0080] [SIGBUS] [unknown code] [0x700000000000050]

Tue Jul 19 02:55:25 BEIST 2011

USER: terminating instance due to error 472

Instance terminated by USER, pid = 1200234

**************************************************************

ORA-00206ORA-00202ORA-27072ORA-07445错误都直接指向文件读写和IO问题。

trace文件ccicvsmk_ckpt_393436.trc中的内容:

**************************************************************

*** 2011-07-19 02:51:57.281

ksedmp: internal or fatal error

ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA

; arguments: [000114e0] [SIGBUS] [unknown code] [0x1000114E0]

ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA

; arguments: [000114e0] [SIGBUS] [unknown code] [0x1000114E0]

ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA

; arguments: [ksihinster+0080] [SIGBUS] [unknown code] [0x7000000000005A2]

ORA-00221: Message 221 not found; product=RDBMS; facility=ORA

ORA-00206: Message 206 not found; product=RDBMS; facility=ORA

; arguments: [3] [1]

ORA-00202: Message 202 not found; product=RDBMS; facility=ORA

; arguments: [/oracle/oradata/ccicvsmk/control03.ctl]

ORA-27072: Message 27072 not found; product=RDBMS; facility=ORA

IBM AIX RISC System/6000 Error: 5: I/O error

Additional information: 10

Additional information: 3

ORA-00206: Message 206 not found; product=RDBMS; facility=ORA

; arguments: [3] [1]

ORA-00202: Message 202 not found; product=RDBMS; facility=ORA

; arguments: [/oracle/oradata/ccicvsmk/control02.ctl]

ORA

ksedmp: Obtaining call stack failed twice. not retrying

Exception signal: 10 (SIGBUS), code: 8 (unknown code), addr: 0x103014080, PC: [0x103014080, 03014080]

Registers:

iar: 0000000103014080, msr: a00000000000d0b2

lr: 000000010270b274, cr: 0000000042248228

r00: 000000010000b4e0, r01: 00000001103b11d0, r02: 0000000110231d88,

r03: 0000000000000001, r04: 070000004f755aa0, r05: 000000004e2480cd,

r06: 015776e5b94ba5a3, r07: 1231a94d98df266d, r08: 0950b8fcb8c42c40,

r09: 12e0be826d694b2f, r10: 0000000000000000, r11: 0000000000003600,

r12: 000000010000b4d8, r13: 000000011026ec98, r14: 00000001103b18f0,

r15: 000000000000009a, r16: 0000000000000002, r17: 00000001101b92a0,

r18: 0ffffffffffffed0, r19: 09fffffff000a798, r20: 000000000000000a,

r21: 000000000000000b, r22: 0000000000000006, r23: 0000000000000001,

r24: 0000000104c1f988, r25: 0000000000000001, r26: 0000000000000000,

r27: 0000000000000001, r28: 0000000000000000, r29: 0000000110008c48,

r30: 0000000000000000, r31: 00000001103b1240, ssexhd: crashing the process...

Background_Core_Dump = partial

**************************************************************

trace文件ccicvsmk_q002_1167456.trc中的内容。

**************************************************************

*** 2011-07-19 02:52:55.762

ksedmp: internal or fatal error

ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA

; arguments: [000114e0] [SIGBUS] [unknown code] [0x1000114E0]

ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA

; arguments: [000114e0] [SIGBUS] [unknown code] [0x1000114E0]

ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA

; arguments: [ksihsmratt+0080] [SIGBUS] [unknown code] [0x700000000000050]

ksedmp: Obtaining call stack failed twice. not retrying

Exception signal: 10 (SIGBUS), code: 8 (unknown code), addr: 0x103014080, PC: [0x103014080, 03014080]

Registers:

iar: 0000000103014080, msr: a00000000000d0b2

lr: 000000010270b274, cr: 0000000042448228

r00: 000000010000b4e0, r01: 00000001103b11d0, r02: 0000000110231d88,

r03: 0000000000000001, r04: 070000004f75b1f0, r05: 000000004e248107,

r06: 015776e6ba54bdd4, r07: 1231a95b367c4dd5, r08: 0950b903b1668e40,

r09: 12e0be826d694b2f, r10: 0000000000000000, r11: 0000000000003600,

r12: 000000010000b4d8, r13: 000000011026ec98, r14: 00000001103b18f0,

r15: 000000000000009a, r16: 0000000000000002, r17: 00000001101b92a0,

r18: 070000004f75b4a0, r19: 070000004f75b2e0, r20: 000000000000000a,

r21: 000000000000000b, r22: 0000000000000006, r23: 0000000000000001,

r24: 0000000104c1f988, r25: 0000000000000001, r26: 0000000000000000,

r27: 0000000000000001, r28: 0000000000000000, r29: 0000000110008c48,

r30: 0000000000000000, r31: 00000001103b1240, ssexhd: crashing the process...

Shadow_Core_Dump = PARTIAL

**************************************************************

查询主机OS错误日志。

**************************************************************

$ errpt |more

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION

E86653C3 0719025111 P H LVDD I/O ERROR DETECTED BY LVM

AE3E3FAD 0620151911 I O SYSJ2 FSCK FOUND ERRORS

D1E21BA3 0620151611 I S errdemon LOG FILE EXPANDED TO REQUESTED SIZE

A6DF45AA 0620151611 I O RMCdaemon The daemon is started.

AE3E3FAD 0620151511 I O SYSJ2 FSCK FOUND ERRORS

AE3E3FAD 0620151511 I O SYSJ2 FSCK FOUND ERRORS

AE3E3FAD 0620151511 I O SYSJ2 FSCK FOUND ERRORS

AE3E3FAD 0620151511 I O SYSJ2 FSCK FOUND ERRORS

AE3E3FAD 0620151511 I O SYSJ2 FSCK FOUND ERRORS

AE3E3FAD 0620151511 I O SYSJ2 FSCK FOUND ERRORS

E86653C3 0620151211 P H LVDD I/O ERROR DETECTED BY LVM

9DBCFDEE 0620151511 T O errdemon ERROR LOGGING TURNED ON

A39F8A49 0620150411 T S syserrlg ERROR LOGGING BUFFER OVERFLOW

E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM

E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM

E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM

E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM

E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM

E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM

E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM

E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM

E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM

E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM

E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM

E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM

**************************************************************

果然在临晨025111发生IO错误,然后不到一分钟DB开始报文件读写错误,几分钟后数据库terminate

将此IO问题上报主机组同事解决。

问题至此告一段落。待主机组同事解决后,再酌情补充。

sys用户无法远程登录

在问题处理过程中还遇到sys用户无法远程登录,顺带处理了,如下

Applies to:

Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi

AIX dztest2 3 5 000B06FAD400

Symptoms:

使用SYS用户远程登录,提示insufficient privilege。其他用户无此问题。

Cause:

根据问题现象基本可以确定是orapw密码文件有问题

查询参数remote_login_passwordfile确定DB身份认证方式。

SQL> show parameter remote_login_passwordfile

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

remote_login_passwordfile string EXCLUSIVE

不是OS认证(OS认证该参数为NONE)则需要orapw密码文件。进入$ORACLE_HOME/dbs没有发现对应的密码文件。问题原因可以确定,没有orapw密码文件。

Solution:

创建密码文件

$>orapwd file=orapwsid password=password entries=num

再次尝试使用用户SYS远程登录,成功。

查看当前具有sysdba权限用户

SQL> select * from v$pwfile_users;

USERNAME SYSDB SYSOP

------------------------------ ----- -----

SYS TRUE TRUE

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/21129591/viewspace-1052718/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2011-07-06

  • 博文量
    24
  • 访问量
    60665