Is it that all cluster members take the same long time to stop or just one/few?
Where is the time spend, i.e. what are the services that take long to stop?
1 of 1 people found this helpful
If you're running a cluster, you should use the cluster commands to stop and start the system if you're not using the UI buttons.
Is there any improvement in performance when you use the cluster commands to stop services?
Have you upgraded this system from a different version of Discovery?
How old is your implementation? If it's old, has your datastore ever been compacted? tw_ds_offline_compact - BMC Discovery 11.3 - BMC Documentation
Where are your cluster nodes located? Are they in the same datacenter? If not, are they logically within a hop or two from each other? Is there latency between them?
All cluster member take the same long time to shutdown. The time is spent in the last step seen on the screenshot above:
"[OK] Reached target shutdown". This steps takes about 45 minutes, after that the Appliance shuts down immediately without displaying further logs or commands.
I'm talking about shutting down the Virtual machines, not about stopping the cluster services (which works fine and takes only a couple of minutes).
To upgrade the Appliances from CentOS 6 to CentOS 7 the whole cluster has been restored on clean Appliances using a backup on 20.11.2018.
The datastore has been one time after the restore. The current size is about 1.150 GB.
All the Appliances are located in the same subnet and in the same datacenter:
[tideway@bmc-discovery-01 log]$ ping bmc-discovery-02 PING bmc-discovery-02.dc.rewe.local (10.62.63.135) 56(84) bytes of data. 64 bytes from bmc-discovery-02.dc.rewe.local (10.62.63.135): icmp_seq=1 ttl=64 time=0.221 ms 64 bytes from bmc-discovery-02.dc.rewe.local (10.62.63.135): icmp_seq=2 ttl=64 time=0.142 ms 64 bytes from bmc-discovery-02.dc.rewe.local (10.62.63.135): icmp_seq=3 ttl=64 time=0.205 ms [tideway@bmc-discovery-01 log]$ ping bmc-discovery-03 PING bmc-discovery-03.dc.rewe.local (10.62.63.136) 56(84) bytes of data. 64 bytes from bmc-discovery-03.dc.rewe.local (10.62.63.136): icmp_seq=1 ttl=64 time=0.112 ms 64 bytes from bmc-discovery-03.dc.rewe.local (10.62.63.136): icmp_seq=2 ttl=64 time=0.182 ms 64 bytes from bmc-discovery-03.dc.rewe.local (10.62.63.136): icmp_seq=3 ttl=64 time=0.182 ms
Then I would suspect something related to the VMWare hardware or the configuration of the clustered VMs. Any obvious difference with the standalone VM configuration on VMWare?
In comparison to the Standalone environment the cluster uses a lot more resources (CPU, RAM, Disk) and the Datastore is quite bigger.
Research showed that swap could be the reason for the long shutdown.
The issue seems to be related to a bug in RHEL7, see https://bugzilla.redhat.com/show_bug.cgi?id=1577958.
I will raise a case now to investigate the issue.