Table of Contents

Troubleshooting – NATS (BrokerGateway-managed NAS/NATS architecture)

Note

The information on this page applies for systems that use the BrokerGateway-managed NATS solution. This includes all systems using DataMiner 10.6.0/10.6.1 or higher as well as systems with lower DataMiner versions that have been migrated to the BrokerGateway‑managed NATS solution. For systems using the legacy SLNet-managed NATS architecture, refer to Legacy NAS/NATS troubleshooting.

To investigate NATS issues in a BrokerGateway-managed system, follow the actions detailed below, in the specified order:

  1. Confirm the system uses BrokerGateway-managed NATS
  2. Check if ClusterEndpointsManager is enabled
  3. Run the NATS cluster verification BPA test
  4. Inspect the logging
  5. Check the configuration files
  6. Test connectivity between nodes
  7. Reset or repair the NATS cluster
Note

These are advanced procedures that are only meant for administrators. If you do not feel confident applying any of these procedures, contact Skyline Communications.

Confirm the system uses BrokerGateway-managed NATS

If you are using a DataMiner version below DataMiner 10.6.0/10.6.1, you need to confirm whether the system has been migrated to the BrokerGateway‑managed NATS solution. From DataMiner 10.6.0/10.6.1 onwards, BrokerGateway is always used so you can skip this step.

To verify this:

  1. Check C:\Skyline DataMiner\MaintenanceSettings.xml for the following configuration:

    <BrokerGateway>true</BrokerGateway>
    
  2. Open services.msc and verify the following:

    • The legacy NATS/NAS services are stopped or removed.
    • The nats-server service exists and is running.

If the system has not yet been migrated to BrokerGateway, refer to Legacy NAS/NATS troubleshooting.

Check if ClusterEndpointsManager is enabled

The ClusterEndpointsManager soft-launch option must be enabled for BrokerGateway to manage NATS clustering properly.

To verify and resolve this:

  1. Open C:\Skyline DataMiner\SoftLaunchOptions.xml.

  2. Check that it does not contain the configuration <ClusterEndpointsManager>false</ClusterEndpointsManager>.

  3. If it does contain this configuration:

    1. Change it to <ClusterEndpointsManager>true</ClusterEndpointsManager>.

    2. Restart the DataMiner Agent.

    3. Do this on all DataMiner Agents in the system.

    4. When all DataMiner are fully started up, run NATSRepair.exe on a singular DataMiner agent.

For more information about soft-launch options, see Activating Soft-Launch Options.

Run the NATS cluster verification BPA test

The NATS cluster verification BPA test can quickly identify common configuration and connectivity issues.

To run this test:

  1. In DataMiner Cube, go to Apps > System Center > Agents > BPA.

  2. Select the NATS cluster verification test.

  3. Click Run.

  4. Review the results for any errors or warnings.

The BPA test will verify:

  • Whether all Agents in the cluster are reachable
  • Whether the NATS configuration is correct on each Agent
  • Whether the necessary credentials and certificates are present

Inspect the logging

Examine the relevant log files to identify error messages or patterns that indicate the root cause of the issue.

Log files to check

Check the following log files in the order listed:

  1. BrokerGateway logging: C:\ProgramData\Skyline Communications\DataMiner BrokerGateway\Logs

    This contains information about BrokerGateway's management of NATS, including credential distribution and cluster configuration changes.

  2. NATS server logging: C:\Program Files\Skyline Communications\DataMiner BrokerGateway\nats-server\nats-server.log

    This contains information about NATS server startup, connection attempts, and any errors during operation.

  3. General SLError logging: C:\Skyline DataMiner\Logging\SLErrors.txt

    This may contain errors from DataMiner processes attempting to connect to NATS.

Common error patterns

  • Authorization violations: Indicate credential mismatches or missing credential files.
  • Connection refused errors: Suggest firewall or antivirus issues. This can also mean that the NATS service is not running.
  • Cluster formation errors: Point to configuration mismatches between nodes.

Check the configuration files

Verify that the key configuration files contain the correct information for your cluster setup.

ClusterEndpoints.json

Location: C:\Skyline DataMiner\Configurations\ClusterEndpoints.json

This file defines the endpoints for each Agent in the cluster.

Things to check:

  • For each Agent entry, an IP must be present with a non-null IgnitionValue.
  • AdditionalEndpoints must list any VIPs (Virtual IP addresses) if applicable.

Example:

{
  "Endpoints": [
    {
      "IP": "172.0.0.1",
      "IgnitionValue": "SomeHashString",
      "AdditionalEndpoints": ["172.0.1.10"]
    },
    {
      "IP": "172.0.0.2",
      "IgnitionValue": "SomeHashString2",
      "AdditionalEndpoints": ["172.0.1.10"]
    }
  ]
}

MessageBrokerConfig.json

Location: C:\ProgramData\Skyline Communications\DataMiner\MessageBrokerConfig.json

This file configures how DataMiner processes connect to BrokerGateway to obtain NATS credentials.

Things to check:

  • CredentialsUrl typically points to the local Agent (using loopback or FQDN). This is the default setting unless it has been manually changed.
  • If the HTTPS certificate CN/SAN does not match the hostname used in the URL, clients may fail with TLS validation errors.
  • appsettings.runtime.json must be present at the path specified in APIKeyPath.

Example:

{
  "BrokerGatewayConfig": {
    "CredentialsUrl": "https://hostname.domainname/BrokerGateway/api/natsconnection/getnatsconnectiondetails",
    "APIKeyPath": "C:\\Program Files\\Skyline Communications\\DataMiner BrokerGateway\\appsettings.runtime.json"
  }
}

appsettings.runtime.json

Location: C:\Program Files\Skyline Communications\DataMiner BrokerGateway\appsettings.runtime.json

This file contains the API key used for authentication between DataMiner processes and BrokerGateway. It is automatically generated during NATS installation and cluster configuration.

Things to check:

  • The file must contain a valid APIKey value.
  • All IP addresses of the cluster nodes must be listed under ClusterInfo. No extraneous or missing entries should be present.

Example structure:

{
  "APIKeys": [
    ...
  ],
  "ClusterInfo": [
    {
      "Ip": "172.0.0.1",
      "ApiKey": "SomeHashString"
    },
    {
      "Ip": "172.0.0.2",
      "ApiKey": "SomeHashString"
    }
  ],
  "HasManualConfig": false
}

Common issues:

  • File missing: If the file is missing, this indicates that the BrokerGateway installation or NATS configuration is incomplete.
  • ClusterInfo mismatch: If the ClusterInfo section does not match the actual cluster nodes, this can lead to authentication failures.

How to fix:

If this file is missing or corrupted, or it contains invalid data, run NATSRepair.exe to regenerate the NATS configuration and credentials. This will recreate the appsettings.runtime.json file.

Test connectivity between nodes

Network connectivity issues between DataMiner Agents can prevent NATS clustering from functioning correctly.

Connectivity check

To test connectivity, on each DataMiner Agent, execute the following PowerShell command to check if nats-server is reachable:

Test-NetConnection <peerIP> -Port 4222

Replace <peerIP> with:

  • The IP address of the local Agent (to test local connectivity)
  • The IP addresses of other Agents in the cluster (to test cluster connectivity)

Expected output for successful connection:

ComputerName     : 172.16.0.1
RemoteAddress    : 172.16.0.1
RemotePort       : 4222
InterfaceAlias   : Ethernet
SourceAddress    : 172.16.0.1
TcpTestSucceeded : True

If TcpTestSucceeded is False, this indicates that there is a firewall issue or that the NATS service is not running on the target Agent.

Required ports

Ensure the following ports are open between all DataMiner Agents:

  • Port 4222: NATS client connections
  • Port 6222: NATS cluster communication (not required in a standalone agent setup)

Resetting/repairing the BrokerGateway NATS cluster

To reset or repair a NATS cluster, use the tool C:\Skyline DataMiner\Tools\NATSRepair.exe, as detailed below.

Only do this if you are sure that the system uses the BrokerGateway‑managed NATS solution (see Confirm the system uses BrokerGateway-managed NATS for info on how to check this).

Note

This will not work if automatic NATS configuration is disabled.

  1. Run C:\Skyline DataMiner\Tools\NATSRepair.exe on one DMA in the system.

    When executed, the tool returns a list of known DataMiner endpoints that will be used to configure the NATS cluster. For example:

    The current known agents are the following:
    172.16.0.1
    172.16.0.2
    172.16.0.3
    
    This structure will be applied to the NATS cluster.
    If agents are missing or incorrect, do not continue and change C:\Skyline DataMiner\Configurations\ClusterEndpoints.json.
    Do you want to continue? (y/n):
    

    This list of endpoints is derived from C:\Skyline DataMiner\Configurations\ClusterEndpoints.json. All IP addresses listed in that file must accurately reflect the complete set of DMAs in the cluster.

  2. Before proceeding, validate the endpoint list:

    • If all displayed endpoints are correct, continue with the repair by entering y.

    • If any endpoints are missing or incorrect, enter n to stop, and manually update ClusterEndpoints.json by adding or removing entries as appropriate. Then rerun NATSRepair.exe.

    Only proceed when the list of IPs shown by NATSRepair matches the intended cluster composition.

  3. The tool will reconfigure NATS on all Agents and restart the necessary services.