CFS Crucial Package Failed: Unable to Join Cluster

CFS Crucial Package Failed: Unable to Join Cluster

We will be going to see how the above errors can be resolved by minimum period as we can because sometimes troubleshooting will be taking some times and rebuild sometimes can be the efficient way to resolve that kind of situation. I used to face the issues after patching activities and the OS was not able to be booted even though to the backup image or single-user mode. The above errors occurs when the OS rebuild and fresh install of the cluster packages and the rest. It looks like hung when it was trying to bring up the CFS package. Looking at to the logs of the package giving no clues but stuck at starting up the gab which is Global Atomic Broadcast services that been used by serviceguard to communicate in between nodes in the same cluster.

Below here is the sample of the logs for CFS:

06/29/19 02:32:56 Monitoring vxconfigd (pid= 532) every 20 secs
06/29/19 02:32:56 Stopping GAB
06/29/19 02:32:56 Stopping GAB.. Done
06/29/19 02:32:56 Stopping LLT
06/29/19 02:32:56 Stopping LLT.. Done
06/29/19 02:32:56 rm -f /etc/llttab /etc/llthosts /etc/gabtab
06/29/19 02:32:56 Starting service SG-CFS-cmvxpingd
06/29/19 02:32:56 cmrunserv SG-CFS-cmvxpingd >> /etc/cmcluster/cfs/SG-CFS-pkg.log 2>&1 /usr/lbin/cmvxpingd -t 132
06/29/19 02:32:56 rm -f /var/adm/cmcluster/cmvxd.socket
06/29/19 02:32:56 Starting service SG-CFS-cmvxd
06/29/19 02:32:56 cmrunserv SG-CFS-cmvxd >> /etc/cmcluster/cfs/SG-CFS-pkg.log 2>&1 /usr/lbin/cmvxd run -s /var/adm/cmcluster/cmvxd.socket -t 132
06/29/19 02:32:56 Creating LLT configuration
06/29/19 02:32:56 mktemp -d /etc
06/29/19 02:32:56 touch /etc/006771
06/29/19 02:32:56 chmod 644 /etc/006771
06/29/19 02:32:56 chmod 444 /etc/006771
06/29/19 02:32:56 mv /etc/006771 /etc/llttab
06/29/19 02:32:56 touch -r /etc/cmcluster/cfs/.SG-CFS-pkg.ref /etc/llttab
06/29/19 02:32:56 Creating GAB configuration
06/29/19 02:32:56 mktemp -d /etc
06/29/19 02:32:56 touch /etc/006788
06/29/19 02:32:56 chmod 644 /etc/006788
06/29/19 02:32:56 chmod 444 /etc/006788
06/29/19 02:32:56 mv /etc/006788 /etc/gabtab
06/29/19 02:32:56 touch -r /etc/cmcluster/cfs/.SG-CFS-pkg.ref /etc/gabtab
06/29/19 02:32:56 chmod 544 /etc/gabtab
06/29/19 02:32:56 Creating initial LLT hosts file
06/29/19 02:32:56 mktemp -d /etc
06/29/19 02:32:56 touch /etc/006808
06/29/19 02:32:56 chmod 644 /etc/006808
06/29/19 02:32:56 chmod 444 /etc/006808
06/29/19 02:32:56 mv /etc/006808 /etc/llthosts
06/29/19 02:32:56 touch -r /etc/cmcluster/cfs/.SG-CFS-pkg.ref /etc/llthosts
06/29/19 02:32:56 Starting Veritas stack
06/29/19 02:32:56 /etc/cmcluster/cfs/vx-modules.1 start
06/29/19 02:32:56 /sbin/init.d/llt start
06/29/19 02:32:56 Starting LLT
06/29/19 02:33:04 /sbin/init.d/gab start
06/29/19 02:33:04 Starting GAB

After that, you will get kernel panic with the messages of “crucial package failed” appeared just before rebooted. For emergency remediation, the cluster can be brought up on one node by running below command:

#cmruncl -n <nodename>

Cloning Partner Node OS

i found out that cloning partner node OS is the fastest and efficient way to solve the issue of crucial package failed on CFS. This is because finding solutions for incompatibility on the Veritas filesystem, cluster components such as GAB, Vxfen and LLT in between the nodes in the cluster was totally wasted and taking longer time as i could not find any answer even in the Veritas manual or websites. The hung and stuck  during CFS starting up was caused by incompatible version of the cluster components which is version 6.10 vs 5.0.1 on the existing running node.  Below is the version for existing running node filesystem:

Nodename:home/userid$ swlist|grep -i vx

  B9116DB                                       B.05.01.01     Full VxVM License for Veritas Volume Manager 5.0.1

  Base-VXFS                                     B.11.31        Base VxFS File System 4.1 Bundle for HP-UX

  Base-VxFS-501                                 B.05.01.03     Veritas File System Bundle 5.0.1 for HP-UX

  Base-VxTools-501                              B.05.01.04     VERITAS Infrastructure Bundle 5.0.1 for HP-UX

  Base-VxVM-501                                 B.05.01.04     Base VERITAS Volume Manager Bundle 5.0.1 for HP-UX

Nodename:home/userid$

Below is the steps for cloning partner node OS:

1) Backup image on the partner node.

2) Restore the image on this node.

3) Setting up the network configurations.

4) Bring the node up into the cluster.

 

Backing up an Image on Existing Partner Node

The most common ways of backup an image is make_net_recovery which we may run the commands as below:

/opt/ignite/bin/make_net_recovery -s Ignite-UX_server

Restore the image on this node.

dbprofile -dn igniteboot-sip <server_ip_address> -cip <node_ip_address> -gip <node_gateway> -m <node_netmask> -b  “/opt/ignite/boot/nbp.efi”
Verify the details of dbprofile by running “dbprofile” on the efi shell prompt. After that, we may boot over network using lanboot:
lanboot select -dn igniteboot

Setting up the network configurations.

Bring the node up into the cluster.

to be continued…

 

References

http://unixmemoires.blogspot.com/2012/01/man-page-makenetrecovery.html

https://community.hpe.com/t5/Ignite-UX/how-to-do-make-net-recovery-from-my-server-to-a-remote-server/td-p/4782498#.XS3khntS82w

http://wiki-ux.info/wiki/How_does_a_make_net_recovery_looks_like

https://docstore.mik.ua/manuals/hp-ux/en/5992-5309/ch09s06.html

 

Mirroring Disk to Migrate Data from an Old Storage Array to a New Array using VxVM in HP-UX

Mirroring Disk to Migrate Data from an Old Storage Array to a New Array using VxVM in HP-UX

Basic Steps for Luns Migration

There was some reasons why the migration need to be done. Some of them was issues related to the storage that triggered on monitoring tool or from the OS itself. Below is the example of the errors captured referring to the storage issues:

node :home/userid$ grep -i sync /var/adm/syslog/syslog.log|tail

Jul  5 00:56:22 node vmunix: Asynchronous write failed on LUN (dev=0x1000030)

Jul  5 09:05:51 node vmunix: Asynchronous write failed on LUN (dev=0x100000f)

Jul  6 16:43:58 node vmunix: Asynchronous write failed on LUN (dev=0x1000030)

Jul  6 20:26:57 node vmunix: Asynchronous write failed on LUN (dev=0x1000030)

Jul  7 04:04:35 node vmunix: Asynchronous write failed on LUN (dev=0x1000030)

Jul  7 11:47:17 node vmunix: Asynchronous write failed on LUN (dev=0x1000030)

node :home/userid$

 

Below is the basic migration steps no matter what is the software or utility that manage the volume such as LVM or VxVM :

1.Create LUNs on the new disk array
2.Present them to the HP-UX server
3.Add the LUNs into the appropriate volume groups
4.Using LVM mirroring to mirror the data from the current LUNs to the new LUNs
5.Verify that the data has been successfully mirrored
6.Reduce the mirrors from the old LUNs
7.Reduce the old LUNs out of the VG
8.Repeat as needed for each VG

For steps no 1, it will normally be handled by storage team and it will be allocated from the storage level.

 

2.Present them to the HP-UX server

We have to scan I/O system for new LUNs using below commands:

# ioscan -fC disk

Sample of the output for above will be as shown below:

Node:home/userid$ ioscan -funC disk
Class I H/W Path Driver S/W State H/W Type Description
==================================================================

disk 311 0/0/0/5/0/0/2.1.54.0.0.3.1 sdisk CLAIMED DEVICE 3PARdataVV
/dev/dsk/c11t3d1 /dev/rdsk/c11t3d1

Install special device files and enable VxVM configuration daemon:

# insf -vC disk

# vxdctl enable

Initialize new added disk so it can be added in the disk group:

/opt/VRTS/bin/vxdisksetup -i c11t3d1

 

3.Add the LUNs into the appropriate volume groups

Associate newdisk with the dg that going to be mirrored

vxdg -g dg01 adddisk dg01_disk02=c11t3d1

bring into the Volume Manager “world” using:
vxdctl enable

 

4.Using LVM mirroring to mirror the data from the current LUNs to the new LUNs

vxassist -g dg01 mirror lvol1 dg01_disk02

if we found below errors, that is mean we have to reset and resfresh the incore database of volume manager:

VxVM vxassist ERROR V-5-1-1080 Unexpected inconsistency in configuration
Disk Access and Disk Media records don’t match

/usr/sbin/vxconfigd -k -m enable

 

5.Verify that the data has been successfully mirrored

Node:home/userid$ vxtask list
TASKID PTID TYPE/STATE PCT PROGRESS
161 ATCOPY/R 22.91% 0/69632000/15955968 PLXATT lvol1 lvol1-02 dg01
Node:home/userid$

if it has been succesfully mirrored, there will be no longer task running in the vxtask list. You may want to regularly checking on the progress during mirroring proccess.

 

6.Reduce the mirrors from the old LUNs

vxplex -g dg01 -o rm dis lvol2-01

 

7.Reduce the old LUNs out of the VG

vxdg -g dg01 rmdisk dg01_disk01

/opt/VRTS/bin/vxdisksetup -i c3t0d3

Initialization to the old disk was a proper handover to the storage support for them to reclaim the luns and re-use for other purposes. This is because if we do not initialize the disk, there will be still old data such as old disk group configurations data and it will be detected during re-scanning of the disk. After initialize the disk, the disk will be considered as unused. You can re-scan the disk using vxdisk -o alldgs list, and if nothing associates or attached with the disk on the result output, then you know that is the old ones. Thanks.

 

 

References:

http://etcfstab.com/hpux/hpux_san_add_vxvm.html
https://sort.veritas.com/ecls/umi/V-5-1-1080
https://vox.veritas.com/t5/Storage-Foundation/Unable-to-mirror-a-volume/td-p/503624
https://www.veritas.com/support/en_US/article.100023745
https://community.hpe.com/t5/LVM-and-VxVM/Mirror-data-with-Mirrordisk-UX-between-two-LUNs/td-p/5700095#.XSQ46Y9S82w
https://community.hpe.com/t5/LVM-and-VxVM/Moving-data-from-old-SAN-to-new-SAN/td-p/6718937#.XSQ4bY9S82w
https://vox.veritas.com/t5/Storage-Foundation/vxassist-multiple-volume-of-a-same-subdisk/td-p/644077
https://sort.veritas.com/public/documents/sf/5.0/hpux/manpages/vxvm/vxassist_1m.html
https://sort.veritas.com/public/documents/sf/5.1/aix/html/vxvm_admin/ch09s10s02.htm