Handling Failures
Software Failures
●
Monitoring provided by resource agents handles restarting
●
If a resource fails and is correctly restarted, no other action is taken
●
In the event that the resource fails to restart, RHCM will stop and relocate the
entire service to another node
Hardware, cluster failures
●
If the cluster infrastructure evicts a node or nodes from the cluster; RHCM
selects new nodes for the services it was running based on the failover
domain if one exists
●
If a NIC fails or a cable is pulled (but the node is still a member of the
cluster), the service will be relocated
Double Faults – Usually difficult or impossible to choose a universally correct
course of action when one occurs. Ex: Node with iLO losing all power vs. pulling
all of its network cables.
Komentarze do niniejszej Instrukcji