Monitoring

Here you will find a checklist of metrics you need to use to monitor the behavior of Oz system.

For your convenience, we provide an opportunity to monitor the Oz system behavior. The checklist below contains the monitoring recommendations: what to check and how to interpret results to ensure everything works well, and the system is ready to properly process your requests. To perform the check, you can use any monitoring system.

System Metrics

  1. Ensure all servers are accessible. Use the ping command.

  2. Check the free space on all disks. It should be 10% or more.

  3. Check the RAM usage of your servers. It should be 90% or less.

Oz Internal Metrics

  1. Check the accessibility of the oz-api services. Launch the health check by calling GET `/api/version`. The successful response returns code 200.

  2. Check the accessibility of the oz-bio services. Call GET `/v1/models/inquisitor`. The successful response returns code 200.

  3. Check if the oz-bio license is up-to-date. Execute the code below in your console:

curl -s -d '{"inputs": {"images_bytes": [{"b64": ""}]}}'  -X POST  http://localhost:8501/v1/models/inquisitor:predict | grep -c assertion

This code should return 1 in the console.

4. Check if there were any unusual delays in the last series of analyses. Execute the code below to get the 90 percentile delay value in seconds:

docker exec -i --user postgres oz-api-pg psql -X -A -t -d gateway -c "select percentile_disc(0.9) within group(order by date_part('epoch', time_updated-time_created)) from gw_analyse_abstract where time_updated > current_timestamp - interval '10 minutes'"

The result in your console should be lower than 20.

5. Check if the suspicious analyses queue is growing. Execute the code below to get the amount of unfinished analyses in the queue:

docker exec -i oz-api-rabbitmq rabbitmqctl list_queues | egrep -i '^tfss\s' | awk '{ print $2 }'

The result in your console should be lower than 20.

If anything doesn’t work as expected, please contact us at support@ozforensics.com.

It is recommended to give our engineers access to at least one channel you use to get the monitoring information. In this case, we’ll be able to provide a timely reaction to any trouble that appear.

Last updated