Table of Contents

Investigating OpenSearch issues

Search the OpenSearch logging for exceptions: /var/log/opensearch/[cluster.name].log

You can find the cluster name in /etc/opensearch/opensearch.yml.

Tip

For more information, see Logs.

Remote certificate invalid

System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel. ---> System.Security.Authentication.AuthenticationException: The remote certificate is invalid according to the validation procedure.

If you encounter the exception above in the SLDBConnection logging, add the rootCA.crt in the Trusted Root Certification Authorities store.

Multiple DNS names for IP

In case multiple DNS names point to a single IP address, set the option below to false in the opensearch.yml file:

plugins.security.ssl.transport.resolve_hostname: false

Transport client authentication is no longer support exception in OpenSearch logging

Caused by: org.opensearch.OpenSearchException: Transport client authentication no longer supported.

If you encounter the exception above in the OpenSearch logging, make sure plugins.security.nodes_dn: matches the certificates subject.

Cluster not formed

It is possible that a cluster is not formed. In that case, in the \var\log\opensearch\ folder of the cluster manager and data nodes, in the log file with the name of your cluster, you will see the following exceptions:

  • Cluster manager:

    [2023-06-14T06:26:40,436][WARN ][o.o.c.c.Coordinator      ] [doj-search2] failed to validate incoming join request from node [{DataNodeName}{8Wm1nzzBSuOxGFIPWvXgng}{h3V9SJq2R8e9tMA5pdk6zg}{166.206.186.147}{166.206.186.147:9300}{di}{shard_indexing_pressure_enabled=true}]
    org.opensearch.transport.RemoteTransportException: [DataNodeName][166.206.186.147:9300][internal:cluster/coordination/join/validate]
    Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid tNh7sXJeQjuf-RTTfFt7qg than local cluster uuid qVI2q9lMSy-Ot7O9v68d_A, rejecting
            at org.opensearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:219) ~[opensearch-2.8.0.jar:2.8.0]
            at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:113) ~[?:?]
            at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?]
            at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-2.8.0.jar:2.8.0]
            at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:453) ~[opensearch-2.8.0.jar:2.8.0]
            at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806) ~[opensearch-2.8.0.jar:2.8.0]
            at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.8.0.jar:2.8.0]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
            at java.lang.Thread.run(Thread.java:833) [?:?]
    
  • Data nodes:

    [2023-06-14T06:27:59,485][INFO ][o.o.c.c.JoinHelper       ] [DataNodeName] failed to join {ClusterManagerName}{QZ-VFeWyTaavSk20IBx8xA}{HrSHPY-tQNaR7NzeA625UQ}{166.206.186.146}{166.206.186.146:9300}{dimr}{shard_indexing_pressure_enabled=true} with JoinRequest{sourceNode={doj-search3}{8Wm1nzzBSuOxGFIPWvXgng}{h3V9SJq2R8e9tMA5pdk6zg}{166.206.186.147}{166.206.186.47:9300}{di}{shard_indexing_pressure_enabled=true}, minimumTerm=6, optionalJoin=Optional[Join{term=6, lastAcceptedTerm=1, lastAcceptedVersion=19, sourceNode={DataNodeName}{8Wm1nzzBSuOxGFIPWvXgng}{h3V9SJq2R8e9tMA5pdk6zg}{166.206.186.147}{166.206.186.147:9300}{di}{shard_indexing_pressure_enabled=true}, targetNode={ClusterManagerName}{QZ-VFeWyTaavSk20IBx8xA}{HrSHPY-tQNaR7NzeA625UQ}{166.206.186.146}{166.206.186.146:9300}{dimr}{shard_indexing_pressure_enabled=true}}]}
    org.opensearch.transport.RemoteTransportException: [ClusterManagerName][166.206.186.146:9300][internal:cluster/coordination/join]
    Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
            at org.opensearch.cluster.coordination.Coordinator$2.onFailure(Coordinator.java:635) ~[opensearch-2.8.0.jar:2.8.0]
            at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:74) ~[opensearch-2.8.0.jar:2.8.0]
            at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1482) ~[opensearch-2.8.0.jar:2.8.0]
            at org.opensearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:420) ~[opensearch-2.8.0.jar:2.8.0]
            at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) ~[opensearch-2.8.0.jar:2.8.0]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
            at java.lang.Thread.run(Thread.java:833) [?:?]
    Caused by: org.opensearch.transport.RemoteTransportException: [DataNodeName][166.206.186.147:9300][internal:cluster/coordination/join/validate]
    Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid tNh7sXJeQjuf-RTTfFt7qg than local cluster uuid qVI2q9lMSy-Ot7O9v68d_A, rejecting
            at org.opensearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:219) ~[opensearch-2.8.0.jar:2.8.0]
            at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:113) ~[?:?]
            at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?]
            at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-2.8.0.jar:2.8.0]
            at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:453) ~[opensearch-2.8.0.jar:2.8.0]
            at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806) ~[opensearch-2.8.0.jar:2.8.0]
            at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.8.0.jar:2.8.0]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
            at java.lang.Thread.run(Thread.java:833) ~[?:?]
    

To fix this issue:

  1. Stop OpenSearch on the cluster manager and data nodes.

  2. Check the cluster.initial_master_nodes and discovery.seed_hosts on all nodes for any differences.

  3. Go to the data folder specified in the opensearch.yml file and delete the folder called nodes along with every file in it by using the following command:

    sudo rm -rf nodes
    
  4. Restart OpenSearch on the cluster manager first and then on the data nodes.

Important

Deleting the nodes folder will result in loss of data. You should only do so with a new installation of OpenSearch.

OpenSearch logging mentions unknown setting [node.voting only]

When [node.voting only] was configured for your tiebreaker in opensearch.yml, this will not work because this is not supported by OpenSearch. When OpenSearch was forked from Elasticsearch, this functionality was blocked.

If you use tiebreakers, configure them to use the master-eligible role in opensearch.yml.

SLSearch.txt logging mentions OpenSearch version is not officially supported

If the SLSearch.txt log file mentions that the OpenSearch version is not officially supported, you can resolve this by upgrading your DMS to DataMiner 10.3.6 or higher.

However, note that this has no functional impact, as the DMA will run fine even if you have not upgraded yet.

OpenSearch service going into timeout

It can occur that you get an OpenSearch service timeout when executing one of the following commands:

sudo systemctl start opensearch
sudo systemctl restart opensearch

For example:

opensearch.service: start operation timed out. Terminating.
opensearch.service: Failed with result 'timeout'.
Failed to start OpenSearch.
opensearch.service: Consumed 57.702s CPU time.

To resolve this, you may need to increase the start timeout for systemd (see systemd):

  1. Open the configuration using Nano editor.

    sudo nano /usr/lib/systemd/system/opensearch.service
    
  2. Increase the value of TimeoutStartSec to a higher value, for example 300.

    TimeoutStartSec=300
    
  3. Enable the OpenSearch service again:

    sudo /bin/systemctl enable opensearch.service
    
  4. Start the OpenSearch service again:

    sudo systemctl start opensearch
    
  5. Execute the following command to verify that OpenSearch keeps running correctly:

    sudo systemctl status opensearch