Check System Health
This BPA test is executed locally on each DataMiner Agent. It analyzes the SLWatchdog reports generated during the past 7 days and checks whether the system's resource usage stayed within healthy boundaries. If suspicious behavior is detected (e.g., high CPU, memory, handles, threads, or disk latency), the test will report an error so that corrective actions can be taken.
This BPA test is available on demand. You can run it in System Center on the Agents > BPA tab.
Metadata
- Name: Check System Health
- Description: Verify the behavior of the system during the last week.
- Author: Skyline Communications
- Default schedule: Every 7 days
Results
Success
When the test is successful, this message is shown:
No suspicious problems detected in the system during the past week.
This BPA currently validates the following parameters against fixed thresholds, based on the SLWatchdog reports of the last 7 days:
| Parameter | Threshold |
|---|---|
| Total CPU per core | 75 % |
Total CPU per SL process (SL* and prunsrv) |
12 % |
| Total Handles | 13 000 000 |
| Total Threads | 30 000 |
| Physical Memory usage (average over 7 days) | 70 % |
| Disk latency – avg sec/read | 20 ms |
| Disk latency – avg sec/write | 20 ms |
| Disk latency – avg sec/transfer | 20 ms |
Additional parameters may be taken into consideration in future versions.
Error
({problems}) warning problems found in the system. Please check as soon as possible.
The output is marked as an error when at least one of the monitored parameters exceeds its threshold in a significant portion of the samples collected during the past 7 days. The detailed result lists every threshold breach using one or more of the following messages:
Core ({name}) utilization has exceeded the 75% threshold in {x}% of the samples.Process ({name}) utilization has exceeded the 12% threshold in {x}% of the samples.The Total Handles utilization has exceeded the 13 000 000 threshold in {x} of the samples.The Total Threads utilization has exceeded the 30 000 threshold in {x} of the samples.The average of Physical Memory Usage is {x}% during the last 7 days.Disk ({name}) latency has exceeded the 20ms avg sec/read threshold in {x}% of the samples.Disk ({name}) latency has exceeded the 20ms avg sec/write threshold in {x}% of the samples.Disk ({name}) latency has exceeded the 20ms avg sec/transfer threshold in {x}% of the samples.
Warning
This BPA does not generate warnings.
Not Executed
If the test fails to execute for unexpected reasons and cannot provide a conclusive report because of this, the following messages can be shown:
Could not execute test ([message]): Returned when an unexpected exception occurs.Error on parsing the file:{file}. Extra Details: ...: Returned when one of the SLWatchdog XML reports does not contain all the expected sections (CPU cores, total physical memory, task manager processes, total handles or total threads).Files not found at C:\Skyline DataMiner\logging\WatchDog\Reports\: Returned when no valid SLWatchdog reports could be found for the past 7 days.
Note
If you get the exception BPA doesn't have valid signature, this means the BPA test is unsigned. To resolve this issue, contact Skyline to ask for the signed version of the BPA or upgrade DataMiner.
Possible solutions
- Impact: Operation of the DataMiner System might be affected by this problem.
- Corrective Action: Investigate the unusual resource usage on the affected Agent. Look into the processes, disks, or hardware components reported in the detailed output and take action to bring the resource usage back within the thresholds listed above.
Limitations
- The BPA only inspects the SLWatchdog reports stored under
C:\Skyline DataMiner\logging\WatchDog\Reports\. If those reports are not available (for example, on offline or recently installed Agents), the test cannot produce a conclusive result. - Thresholds are fixed and cannot currently be configured by the user.
- Only the parameters listed above are validated; other system health indicators are not yet covered.