Performance and Scalability Guide

The Oz system consists of many interconnected components.

Oz components

Most of the components require scaling to ensure functionality during an increase in APM. The scheme below represents how the system processes an analysis using supportive software.

Rough representation of analysis flow

This guide explains how to scale components and improve performance for both Docker and Kubernetes deployments.

For K8s deployments, use HPA. In the chart values, you'll find the use_hpa parameter for each component that supports HPA.

Scaling Oz BIO and Celery-TFSS

BIO is the most resource-consuming component as it performs media analysis processing.

  • The BIO component may take up to 10 minutes to start (applicable for versions <1.2).

  • Scaling BIO nodes might be challenging during a rapid increase in APM.

  • For Docker installations, plan the required number of components to handle peak loads.

  • For Kubernetes installations, schedule the creation of additional BIO pods to manage demand using cronjobHpaUpdater in values.yaml.

Node requirements

  • CPU must support avx, avx2, fma, sse4.1, and sse4.2.

  • If CPU support avx512f instructions: to slightly increase performance, you can use a specific BIO image.

  • We recommend using Intel CPU. AMD CPUs are also supported, but require additional configuration.

  • For better performance, each BIO should reside on a dedicated node with reserved resources.

Scaling

Each BIO instance can handle an unlimited number of simultaneous requests. However, as the number increases, the execution time for each request grows.

The assignment of analyses to BIO instances is managed by celery-tfss workers. Each celery-tfss instance is configured to handle 2 analyses simultaneously, as this provides optimal performance and analysis time.

For Kubernetes installations, this behavior can be adjusted using the concurrency parameter in values.yaml. In the default configuration, the number of celery-tfss instances should match the number of BIOs.

The required number of BIOs can be calculated using the formula below, based on the average analysis time and the expected number of analyses:

Here,

  • N(BIO) is the required number of BIO nodes

  • n(APM) is the expected number of APM,

  • t(analysis) is the measured average duration of a single analysis (3 seconds by default),

  • C is concurrency (2 by default).

For a large number of requests involving .zip archives (e.g., if you use Web SDK), you might require additional scaling of the celery-preview_convert service.

Database

Oz API is not intended to be a long-term analysis storage system, so you might encounter longer API response time after 10-30 million folders stored in database (depending on the database performance). Thus, we recommend archiving or deleting data from API after one year of storing it.

For the self-managed database, you'll require the following resources:

API

  • CPU: 4 cores (up to 8),

  • RAM: 8 GB (up to 16),

  • SSD storage.

1:N

  • CPU: 4 cores (up to 8),

  • RAM: equal to the database size,

  • SSD storage.

PostgreSQL database parameters
  • idle_in_transaction_session_timeout: 60s,

  • max_connections: 2000 (may vary depending on number of API calls),

  • shared_buffers: 2 GB (amount of RAM divided by 2),

  • effective_cache_size: 6 GB (RAM – 2 GB),

  • maintenance_work_mem: 512 MB,

  • checkpoint_completion_target: 0.9,

  • wal_buffers: 16 MB,

  • default_statistics_target: 100,

  • random_page_cost: 1.1,

  • effective_io_concurrency: 200,

  • work_mem: 16 MB,

  • min_wal_size: 1 GB,

  • max_wal_size: 4 GB,

  • max_worker_processes: 4 (equal to the number of CPUs),

  • max_parallel_workers_per_gather: 2 (number of CPUs divided by 2),

  • max_parallel_workers: 4 (equal to the number of CPUs),

  • max_parallel_maintenance_workers: 2 (number of CPUs divided by 2),

  • If database is Dockered, set shm_size: '1gb'.

To increase API performance, consider using this list of indexes:

Gateway indexes

For high-load environments, achieving the best performance requires precise database tuning. Please contact our support for assistance.

Gesture Performance

The duration of the analysis depends on the gesture used for Liveness. Passive Liveness gestures are generally analyzed faster, while Active Liveness gestures provide higher accuracy but take more time. The longer the gesture takes, the longer the analysis will take.

This table represents analysis duration for different gestures.

Gesture

Average analysis time

50th percentile

[video_selfie_best]

3.895

3.13

[video_selfie_blank]

5.491

4.984

[video_selfie_down]

11.051

8.052

[video_selfie_eyes]

9.953

7.399

[video_selfie_high]

10.937

8.112

[video_selfie_left]

9.713

7.558

[video_selfie_right]

9.95

7.446

[video_selfie_scan]

9.425

7.29

[video_selfie_smile]

10.25

7.621

API Methods Performance

API performance mainly depends on database and storage performance. Most of the available API methods use a database to return results. Thus, to maintain low API response time, we recommend using a high-performance database. Additionally, to reduce the number of requests for analysis results via the /api/foldersmethods, please consider webhook callback usage.

If you use S3, check its performance as well. Run POST api/folders with a single media sized ~3 MB. After that, check the upload time in logs, “Files saved to storage for X seconds”. X should be 0.5 or less; if it is 2 or more, some API operations may lead to timeouts.

Below is a list of recommended and non-recommended practices for using Oz API methods.

/authorize

Do

  • Refresh expired token when possible.

  • Monitor token expiration.

  • Use service token only in when appropriate.

Avoid

  • Making requests with an expired token.

  • Creating a new token instead of refreshing an expiring one.

  • Requesting a new token on each API call.

/api/companies

Do

  • Create a new named company for your deployment.

Avoid

  • Using the default system_company.

/api/users

Do

  • Create named accounts for users.

  • Create a separate named service account for each business process.

Avoid

  • Sharing accounts between users.

  • Creating a new user for each analysis request.

/api/healthcheck

Avoid

  • Using healthcheck too frequently as it may overwhelm system with internal checks.

GET /api/folders

Do

  • Always use the time-limiting query parameters: time.created.min and time.created.max.

  • Use the total_omit=true query parameter.

  • Use the folder_id query parameter when you know the folder ID.

  • Use the technical_meta_data query parameter in case you have meta_data set.

  • Use the limit query parameter when possible.

Avoid

  • Using the with_analyzes=true query parameter when it is unnecessary, in requests involving long time periods or unspecified durations.

  • Using the technical_meta_data query parameter with unspecified additional parameters or time period.

POST /api/folders

Avoid

  • Placing too many data in the payload meta_data.

Last updated

Was this helpful?