Triggering a script

When the state of a node changes, DataMiner NodeRecovery can trigger the launch of an automation script to deal with the state change as you see fit. A script can be triggered based on a local state change or based on a global state change.

Local state change

The local state change script is triggered whenever the local node detects one or more state changes for nodes in the cluster.

It will be triggered on any of the nodes that have an updated state view of the cluster or when any of the nodes enter maintenance mode. If one node goes down, each other Agent in the cluster will have the script executed. If you configure a Local State Change script, make sure it is able to deal with this. See pitfalls for details.

By default, if there is a script called NodeRecovery - Local State Change in the Automation module, this will be used as the local state change script. You can customize this script name in the Node Recovery settings.

The script requires a custom entry point of type OnNodeRecoveryLocalStateChange. This entry point method should have the IEngine object as its first argument, a LocalStateChangeInput object as its second argument, and an instance of LocalStateChangeOutput as output.

The input provided to the script contains information on the current node as well as the observed states for all nodes in the cluster. It also contains details on what has actually changed.

A script with this entry point can look like this:

using Skyline.DataMiner.Automation;
using Skyline.DataMiner.Net.NodeRecovery;

namespace NodeRecovery_Local_Action_Example
{
    public class Script
    {
        [AutomationEntryPoint(AutomationEntryPointType.Types.OnNodeRecoveryLocalStateChange)]
        public LocalStateChangeOutput OnNodeRecoveryLocalStateChange(IEngine engine, LocalStateChangeInput input)
        {
          engine.GenerateInformation($"Hello From OnNodeRecoveryLocalStateChange, " 
                                    + $"executing dma: {input.LocalNodeId}, "
                                    + $"local state: {input.ClusterState[input.LocalNodeId]}");
        
          return new LocalStateChangeOutput();
        }
    }
}

Pitfalls for local state changes

Make sure your script logic is aware of the following:

The script will be executed on each of the nodes that detects a state change for a node.
The script will be executed on every state change. This includes going from Unknown to Healthy on node startup.
The script will be executed when nodes enter or leave maintenance mode, even if no other state changes occur.
One script execution can correspond with multiple node state changes at once.
Even if a node is marked as being in maintenance mode, the state for this node will still be reported as Healthy, Outage, or Unknown. It is up to the script to check the maintenance state if needed and to decide how to handle it.
If your script requires information about elements hosted on an Agent that is unreachable, it needs to request this info from SLNet via SLNet messages (engine.GetUserConnection().HandleMessage()) and not via SLAutomation (~~engine.FindElement()~~) as the latter only has access to elements that are currently running and reachable.

We also highly recommend that you have your script check the incoming node IDs, states, and maintenance states before starting any actions.

Global state change

The global state change script is triggered whenever the global cluster state changes. Unlike local state changes where each node independently detects changes from its own perspective, global state changes represent a cluster-wide consensus.

The script will only be executed on the leader node when the global cluster state changes. The leader node gets elected within the cluster and updates if the leader no longer has a view on the majority of the cluster. This ensures that actions based on global state changes are executed only once across the cluster, avoiding duplicate actions that would occur with local state changes. If a cluster has less than three nodes, there cannot be a leader and thus no global state changes will be detected.

By default, if there is a script called NodeRecovery - Global State Change in the Automation module, this will be used as the global state change script. You can customize this script name in the Node Recovery settings.

The script requires a custom entry point of type OnNodeRecoveryGlobalStateChange. This entry point method should have the IEngine object as its first argument, a GlobalStateChangeInput object as its second argument, and an instance of GlobalStateChangeOutput as output.

The input provided to the script contains information about the global cluster state as calculated by the leader node. It includes the consensus view of all node states across the cluster.

A script with this entry point can look like this:

using Skyline.DataMiner.Automation;
using Skyline.DataMiner.Net.NodeRecovery;

namespace NodeRecovery_Global_Action_Example
{
    public class Script
    {
        [AutomationEntryPoint(AutomationEntryPointType.Types.OnNodeRecoveryGlobalStateChange)]
        public GlobalStateChangeOutput OnNodeRecoveryGlobalStateChange(IEngine engine, GlobalStateChangeInput input)
        {
          engine.GenerateInformation($"Hello From OnNodeRecoveryGlobalStateChange, " 
                                    + $"executing on leader node: {input.LeaderNodeId}, "
                                    + $"global cluster state has changed");
        
          return new GlobalStateChangeOutput();
        }
    }
}

Tip

The following trigger script is available in the Catalog by way of example: Node Recovery - Rebalance across healthy Agents. Whenever the global cluster state changes, this example script will move elements hosted on nodes that are in outage to healthy nodes, while trying to keep the load balanced across the cluster as much as possible. You can use this script as is or use it as a starting point for your own scripts.

Pitfalls for global state changes