ITPub博客

首页 > 数据库 > Oracle > 在redhat7系统上为Oracle11g数据库打PSU报CRS-4124 CRS-4000错误分析与追踪

在redhat7系统上为Oracle11g数据库打PSU报CRS-4124 CRS-4000错误分析与追踪

原创 Oracle 作者:xueshancheng 时间:2021-10-22 16:53:31 0 删除 编辑

1 启动集群,发现不能启动,,报如下错误

[root@testdb2 ~]# /u01/app/11.2.0/grid/bin/crsctl start crs

CRS-4124: Oracle High Availability Services startup failed.

CRS-4000: Command Start failed, or completed with errors.


2 根据报错信息,按照网上的文档,删除/var/tmp/.oracle/npohasd 发现不能解决问题

[root@testdb2 ~]# cd /var/tmp

[root@testdb2 tmp]# rm -rf .oracle


[root@testdb2 ~]# /u01/app/11.2.0/grid/bin/crsctl start crs

CRS-4124: Oracle High Availability Services startup failed.

CRS-4000: Command Start failed, or completed with errors.


3 然后对相关进程做TRACE

[root@testdb2 ~]# ps -ef|grep  crsctl

root      32511  31666  0 17:34 pts/0    00:00:00 /u01/app/11.2.0/grid/bin/crsctl.bin start crs

root      36355  36109  0 17:36 pts/1    00:00:00 grep --color=auto crsctl

[root@testdb2 ~]#  strace -p 32511

strace: Process 32511 attached

restart_syscall(<... resuming interrupted nanosleep ...>) = 0

open("/proc/self/status", O_RDONLY)     = 3

read(3, "Name:\tcrsctl.bin\nUmask:\t0022\nSta"..., 4096) = 1334

close(3)                                = 0

access("/usr/lib64/qt-3.3/bin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)

access("/usr/local/sbin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)

access("/usr/local/bin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)

access("/sbin/crsctl.bin", F_OK)        = -1 ENOENT (No such file or directory)

access("/bin/crsctl.bin", F_OK)         = -1 ENOENT (No such file or directory)

access("/usr/sbin/crsctl.bin", F_OK)    = -1 ENOENT (No such file or directory)

access("/usr/bin/crsctl.bin", F_OK)     = -1 ENOENT (No such file or directory)

access("/root/bin/crsctl.bin", F_OK)    = -1 ENOENT (No such file or directory)

brk(NULL)                               = 0xefe000

brk(0xf3e000)                           = 0xf3e000

brk(NULL)                               = 0xf3e000

brk(0xfbe000)                           = 0xfbe000

brk(NULL)                               = 0xfbe000

brk(0xff6000)                           = 0xff6000

.........

41999      0.000019 access("/var/tmp/.oracle/npohasd", F_OK) = -1 ENOENT (No such file or directory) <0.000007>

41999      0.000020 access("/var/tmp/.oracle/npohasd", F_OK) = -1 ENOENT (No such file or directory) <0.000008>

41999      0.000019 access("/var/tmp/.oracle/npohasd", F_OK) = -1 ENOENT (No such file or directory) <0.000007>

41999      0.000020 access("/var/tmp/.oracle/npohasd", F_OK) = -1 ENOENT (No such file or directory) <0.000008>


由于多次删除/var/tmp/.oracle目录,并不能解决问题,根据如下信息,又发现/var/tmp/.oracle/npohasd文件不能访问

于是到Oracle官网查询相关信息,发现如下文档《

Linux: OS "init" process does not start init.ohasd in inittab (Doc ID 1591775.1) To BottomTo Bottom

》说明是由于ohasd进程不能启动,导致crsctl 不能启动集群。

文档内容如下:


4 根据如上文档,认为是ohasd服务不能启动导致的,由于oracle11G在redhat7支持的不是很好,故怀疑

是自己创建的ohasd服务异常,导致的数据库集群不能启动。

查看ohas.service服务的状态,发现ohasd进程虽然是running,但提示有die,

[root@testdb2 tmp]# systemctl status ohas.service

● ohas.service - Oracle High Availability Services

   Loaded: loaded (/usr/lib/systemd/system/ohas.service; enabled; vendor preset: disabled)

   Active: active (running) since Fri 2021-10-22 08:41:05 CST; 50min ago

 Main PID: 26360 (init.ohasd)

    Tasks: 1

   CGroup: /system.slice/ohas.service

           └─26360 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 Type=simple


Oct 22 08:41:25 testdb2 clsecho[29462]: /etc/init.d/init.ohasd: ohasd.bin process 9443 died while waiting to move.

Oct 22 08:41:25 testdb2 init.ohasd[26360]: /etc/init.d/init.ohasd: ohasd.bin p rocess 9443 died while waiting to move.

Oct 22 08:59:27 testdb2 clsecho[42022]: /etc/init.d/init.ohasd: 4999 > /sys/fs/cgroup/cpu,cpuacct/tasks

Oct 22 08:59:27 testdb2 init.ohasd[26360]: /etc/init.d/init.ohasd: 4999 > /sys/fs/cgroup/cpu,cpuacct/tasks

Oct 22 08:59:27 testdb2 init.ohasd[26360]: /bin/echo: write error: No such process

Oct 22 08:59:27 testdb2 clsecho[42025]: /etc/init.d/init.ohasd: 4999 > /sys/fs/cgroup/systemd/system.slice/oracle-ohasd.service/tasks

Oct 22 08:59:27 testdb2 init.ohasd[26360]: /etc/init.d/init.ohasd: 4999 > /sys/fs/cgroup/systemd/system.slice/oracle-ohasd.service/tasks

Oct 22 08:59:27 testdb2 init.ohasd[26360]: /bin/echo: write error: No such process

Oct 22 08:59:27 testdb2 clsecho[42032]: /etc/init.d/init.ohasd: ohasd.bin process 4999 died while waiting to move.

Oct 22 08:59:27 testdb2 init.ohasd[26360]: /etc/init.d/init.ohasd: ohasd.bin process 4999 died while waiting to m ove.


5 按照 文档《

Linux: OS "init" process does not start init.ohasd in inittab (Doc ID 1591775.1) To BottomTo Bottom

》 修改主机的配置,再次启动数据库集群,发现集群可以正常启动了。

修改如下:

[root@testdb2 tmp]# cat /etc/inittab |grep -v "#"


htfa:35:respawn:/etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null


h1:35:respawn:/etc/init.d/init.ohasd run


经过十几次的测试,以下命令可以正常启动了,再也没有发生CRS-4124 和CRS-4000的错误了。

[root@testdb2 tmp]#  /u01/app/11.2.0/grid/bin/crsctl start crs

CRS-4123: Oracle High Availability Services has been started.




来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/69996316/viewspace-2838836/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
本人目前就职于北京海天起点技术服务有限股份公司,从事Oracle数据库有十几年了,对Oracle及goldengate比较精通。

注册时间:2021-03-11

  • 博文量
    43
  • 访问量
    11479