CFS Crucial Package Failed: Unable to Join Cluster

CFS Crucial Package Failed: Unable to Join Cluster

We will be going to see how the above errors can be resolved by minimum period as we can because sometimes troubleshooting will be taking some times and rebuild sometimes can be the efficient way to resolve that kind of situation. I used to face the issues after patching activities and the OS was not able to be booted even though to the backup image or single-user mode. The above errors occurs when the OS rebuild and fresh install of the cluster packages and the rest. It looks like hung when it was trying to bring up the CFS package. Looking at to the logs of the package giving no clues but stuck at starting up the gab which is Global Atomic Broadcast services that been used by serviceguard to communicate in between nodes in the same cluster.

Below here is the sample of the logs for CFS:

06/29/19 02:32:56 Monitoring vxconfigd (pid= 532) every 20 secs
06/29/19 02:32:56 Stopping GAB
06/29/19 02:32:56 Stopping GAB.. Done
06/29/19 02:32:56 Stopping LLT
06/29/19 02:32:56 Stopping LLT.. Done
06/29/19 02:32:56 rm -f /etc/llttab /etc/llthosts /etc/gabtab
06/29/19 02:32:56 Starting service SG-CFS-cmvxpingd
06/29/19 02:32:56 cmrunserv SG-CFS-cmvxpingd >> /etc/cmcluster/cfs/SG-CFS-pkg.log 2>&1 /usr/lbin/cmvxpingd -t 132
06/29/19 02:32:56 rm -f /var/adm/cmcluster/cmvxd.socket
06/29/19 02:32:56 Starting service SG-CFS-cmvxd
06/29/19 02:32:56 cmrunserv SG-CFS-cmvxd >> /etc/cmcluster/cfs/SG-CFS-pkg.log 2>&1 /usr/lbin/cmvxd run -s /var/adm/cmcluster/cmvxd.socket -t 132
06/29/19 02:32:56 Creating LLT configuration
06/29/19 02:32:56 mktemp -d /etc
06/29/19 02:32:56 touch /etc/006771
06/29/19 02:32:56 chmod 644 /etc/006771
06/29/19 02:32:56 chmod 444 /etc/006771
06/29/19 02:32:56 mv /etc/006771 /etc/llttab
06/29/19 02:32:56 touch -r /etc/cmcluster/cfs/.SG-CFS-pkg.ref /etc/llttab
06/29/19 02:32:56 Creating GAB configuration
06/29/19 02:32:56 mktemp -d /etc
06/29/19 02:32:56 touch /etc/006788
06/29/19 02:32:56 chmod 644 /etc/006788
06/29/19 02:32:56 chmod 444 /etc/006788
06/29/19 02:32:56 mv /etc/006788 /etc/gabtab
06/29/19 02:32:56 touch -r /etc/cmcluster/cfs/.SG-CFS-pkg.ref /etc/gabtab
06/29/19 02:32:56 chmod 544 /etc/gabtab
06/29/19 02:32:56 Creating initial LLT hosts file
06/29/19 02:32:56 mktemp -d /etc
06/29/19 02:32:56 touch /etc/006808
06/29/19 02:32:56 chmod 644 /etc/006808
06/29/19 02:32:56 chmod 444 /etc/006808
06/29/19 02:32:56 mv /etc/006808 /etc/llthosts
06/29/19 02:32:56 touch -r /etc/cmcluster/cfs/.SG-CFS-pkg.ref /etc/llthosts
06/29/19 02:32:56 Starting Veritas stack
06/29/19 02:32:56 /etc/cmcluster/cfs/vx-modules.1 start
06/29/19 02:32:56 /sbin/init.d/llt start
06/29/19 02:32:56 Starting LLT
06/29/19 02:33:04 /sbin/init.d/gab start
06/29/19 02:33:04 Starting GAB

After that, you will get kernel panic with the messages of “crucial package failed” appeared just before rebooted. For emergency remediation, the cluster can be brought up on one node by running below command:

#cmruncl -n <nodename>

Cloning Partner Node OS

i found out that cloning partner node OS is the fastest and efficient way to solve the issue of crucial package failed on CFS. This is because finding solutions for incompatibility on the Veritas filesystem, cluster components such as GAB, Vxfen and LLT in between the nodes in the cluster was totally wasted and taking longer time as i could not find any answer even in the Veritas manual or websites. The hung and stuck  during CFS starting up was caused by incompatible version of the cluster components which is version 6.10 vs 5.0.1 on the existing running node.  Below is the version for existing running node filesystem:

Nodename:home/userid$ swlist|grep -i vx

  B9116DB                                       B.05.01.01     Full VxVM License for Veritas Volume Manager 5.0.1

  Base-VXFS                                     B.11.31        Base VxFS File System 4.1 Bundle for HP-UX

  Base-VxFS-501                                 B.05.01.03     Veritas File System Bundle 5.0.1 for HP-UX

  Base-VxTools-501                              B.05.01.04     VERITAS Infrastructure Bundle 5.0.1 for HP-UX

  Base-VxVM-501                                 B.05.01.04     Base VERITAS Volume Manager Bundle 5.0.1 for HP-UX

Nodename:home/userid$

Below is the steps for cloning partner node OS:

1) Backup image on the partner node.

2) Restore the image on this node.

3) Setting up the network configurations.

4) Bring the node up into the cluster.

 

Backing up an Image on Existing Partner Node

The most common ways of backup an image is make_net_recovery which we may run the commands as below:

/opt/ignite/bin/make_net_recovery -s Ignite-UX_server

Restore the image on this node.

dbprofile -dn igniteboot-sip <server_ip_address> -cip <node_ip_address> -gip <node_gateway> -m <node_netmask> -b  “/opt/ignite/boot/nbp.efi”
Verify the details of dbprofile by running “dbprofile” on the efi shell prompt. After that, we may boot over network using lanboot:
lanboot select -dn igniteboot

Setting up the network configurations.

Bring the node up into the cluster.

to be continued…

 

References

http://unixmemoires.blogspot.com/2012/01/man-page-makenetrecovery.html

https://community.hpe.com/t5/Ignite-UX/how-to-do-make-net-recovery-from-my-server-to-a-remote-server/td-p/4782498#.XS3khntS82w

http://wiki-ux.info/wiki/How_does_a_make_net_recovery_looks_like

https://docstore.mik.ua/manuals/hp-ux/en/5992-5309/ch09s06.html