Troubleshooting – OpenSearch
Search the OpenSearch logging for exceptions: /var/log/opensearch/[cluster.name].log
You can find the cluster name in /etc/opensearch/opensearch.yml
.
Tip
For more information, see Logs.
Remote certificate invalid
System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel. ---> System.Security.Authentication.AuthenticationException: The remote certificate is invalid according to the validation procedure.
If you encounter the exception above in the SLDBConnection logging, add the rootCA.crt in the Trusted Root Certification Authorities store.
Multiple DNS names for IP
In case multiple DNS names point to a single IP address, set the option below to false in the opensearch.yml file:
plugins.security.ssl.transport.resolve_hostname: false
Tip
Transport client authentication is no longer support exception in OpenSearch logging
Caused by: org.opensearch.OpenSearchException: Transport client authentication no longer supported.
If you encounter the exception above in the OpenSearch logging, make sure plugins.security.nodes_dn: matches the certificates subject.
Cluster not formed
It is possible that a cluster is not formed. In that case, in the \var\log\opensearch\
folder of the cluster manager and data nodes, in the log file with the name of your cluster, you will see the following exceptions:
Cluster manager:
[2023-06-14T06:26:40,436][WARN ][o.o.c.c.Coordinator ] [doj-search2] failed to validate incoming join request from node [{DataNodeName}{8Wm1nzzBSuOxGFIPWvXgng}{h3V9SJq2R8e9tMA5pdk6zg}{166.206.186.147}{166.206.186.147:9300}{di}{shard_indexing_pressure_enabled=true}] org.opensearch.transport.RemoteTransportException: [DataNodeName][166.206.186.147:9300][internal:cluster/coordination/join/validate] Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid tNh7sXJeQjuf-RTTfFt7qg than local cluster uuid qVI2q9lMSy-Ot7O9v68d_A, rejecting at org.opensearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:219) ~[opensearch-2.8.0.jar:2.8.0] at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:113) ~[?:?] at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?] at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-2.8.0.jar:2.8.0] at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:453) ~[opensearch-2.8.0.jar:2.8.0] at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806) ~[opensearch-2.8.0.jar:2.8.0] at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.8.0.jar:2.8.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] at java.lang.Thread.run(Thread.java:833) [?:?]
Data nodes:
[2023-06-14T06:27:59,485][INFO ][o.o.c.c.JoinHelper ] [DataNodeName] failed to join {ClusterManagerName}{QZ-VFeWyTaavSk20IBx8xA}{HrSHPY-tQNaR7NzeA625UQ}{166.206.186.146}{166.206.186.146:9300}{dimr}{shard_indexing_pressure_enabled=true} with JoinRequest{sourceNode={doj-search3}{8Wm1nzzBSuOxGFIPWvXgng}{h3V9SJq2R8e9tMA5pdk6zg}{166.206.186.147}{166.206.186.47:9300}{di}{shard_indexing_pressure_enabled=true}, minimumTerm=6, optionalJoin=Optional[Join{term=6, lastAcceptedTerm=1, lastAcceptedVersion=19, sourceNode={DataNodeName}{8Wm1nzzBSuOxGFIPWvXgng}{h3V9SJq2R8e9tMA5pdk6zg}{166.206.186.147}{166.206.186.147:9300}{di}{shard_indexing_pressure_enabled=true}, targetNode={ClusterManagerName}{QZ-VFeWyTaavSk20IBx8xA}{HrSHPY-tQNaR7NzeA625UQ}{166.206.186.146}{166.206.186.146:9300}{dimr}{shard_indexing_pressure_enabled=true}}]} org.opensearch.transport.RemoteTransportException: [ClusterManagerName][166.206.186.146:9300][internal:cluster/coordination/join] Caused by: java.lang.IllegalStateException: failure when sending a validation request to node at org.opensearch.cluster.coordination.Coordinator$2.onFailure(Coordinator.java:635) ~[opensearch-2.8.0.jar:2.8.0] at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:74) ~[opensearch-2.8.0.jar:2.8.0] at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1482) ~[opensearch-2.8.0.jar:2.8.0] at org.opensearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:420) ~[opensearch-2.8.0.jar:2.8.0] at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) ~[opensearch-2.8.0.jar:2.8.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:833) [?:?] Caused by: org.opensearch.transport.RemoteTransportException: [DataNodeName][166.206.186.147:9300][internal:cluster/coordination/join/validate] Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid tNh7sXJeQjuf-RTTfFt7qg than local cluster uuid qVI2q9lMSy-Ot7O9v68d_A, rejecting at org.opensearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:219) ~[opensearch-2.8.0.jar:2.8.0] at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:113) ~[?:?] at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?] at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-2.8.0.jar:2.8.0] at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:453) ~[opensearch-2.8.0.jar:2.8.0] at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806) ~[opensearch-2.8.0.jar:2.8.0] at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.8.0.jar:2.8.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:833) ~[?:?]
To fix this issue:
Stop OpenSearch on the cluster manager and data nodes.
Check the cluster.initial_master_nodes and discovery.seed_hosts on all nodes for any differences.
Go to the data folder specified in the opensearch.yml file and delete the folder called nodes along with every file in it by using the following command:
sudo rm -rf nodes
Restart OpenSearch on the cluster manager first and then on the data nodes.
Important
Deleting the nodes folder will result in loss of data. You should only do so with a new installation of OpenSearch.
OpenSearch logging mentions unknown setting [node.voting only]
When [node.voting only] was configured for your tiebreaker in opensearch.yml, this will not work because this is not supported by OpenSearch. When OpenSearch was forked from Elasticsearch, this functionality was blocked.
If you use tiebreakers, configure them to use the master-eligible role in opensearch.yml.
SLSearch.txt logging mentions OpenSearch version is not officially supported
If the SLSearch.txt log file mentions that the OpenSearch version is not officially supported, you can resolve this by upgrading your DMS to DataMiner 10.3.6 or higher.
However, note that this has no functional impact, as the DMA will run fine even if you have not upgraded yet.
OpenSearch service going into timeout
It can occur that you get an OpenSearch service timeout when executing one of the following commands:
sudo systemctl start opensearch
sudo systemctl restart opensearch
For example:
opensearch.service: start operation timed out. Terminating.
opensearch.service: Failed with result 'timeout'.
Failed to start OpenSearch.
opensearch.service: Consumed 57.702s CPU time.
To resolve this, you may need to increase the start timeout for systemd (see systemd):
Open the configuration using Nano editor.
sudo nano /usr/lib/systemd/system/opensearch.service
Increase the value of TimeoutStartSec to a higher value, for example 300.
TimeoutStartSec=300
Enable the OpenSearch service again:
sudo /bin/systemctl enable opensearch.service
Start the OpenSearch service again:
sudo systemctl start opensearch
Execute the following command to verify that OpenSearch keeps running correctly:
sudo systemctl status opensearch
Error when executing securitadmin.sh
ERR: An unexpected SSLHandshakeException occurred: Received fatal alert: certificate_unknown
When OpenSearch shows this generic error, check the OpenSearch logging (refer to the top of this page for details) to see if you can find a root cause.
If you used your own certificates, make sure that your admin certificate is signed by the same rootCA as your node certificates. You can validate this with following command:
openssl verify -verbose -CAfile [Path To Your RootCA] [Path To Your Admin Certificate]