# Deployment Architecture

## Terms and Definitions

| Term | Description                                                                                                                                                                                                                                                                                                                                                       |
| ---- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| APM  | <p>Analyses per minute. Please note:</p><ul><li>Analysis is a request for Quality (Liveness) or Biometry analysis using a single media.</li><li>A single analysis with multiple media counts as separate analyses in terms of APM.</li><li>Multiple analysis types on single media (two media for Biometry) count as separate analyses in terms of APM.</li></ul> |
| PoC  | Proof of Concept                                                                                                                                                                                                                                                                                                                                                  |
| Node | A Node is a worker machine. Can be either a virtual or a physical machine.                                                                                                                                                                                                                                                                                        |
| HA   | High availability                                                                                                                                                                                                                                                                                                                                                 |
| K8s  | Kubernetes                                                                                                                                                                                                                                                                                                                                                        |
| SC   | StorageClass                                                                                                                                                                                                                                                                                                                                                      |
| RWX  | ReadWriteMany                                                                                                                                                                                                                                                                                                                                                     |

## Components' Description

Oz API components:

* **APP** is the API front app that receives REST requests, performs preprocessing, and creates tasks for other API components.
* **Celery** is the asynchronous task queue. API has the following celery queues:
  * **Celery-default** processes system-wide tasks.
  * **Celery-maintenance** processes maintenance tasks.
  * **Celery-tfss** processes analysis tasks.
  * **Celery-resolution** checks for completion of all nested analyses within a folder and changes folder status.
  * **Celery-preview\_convert** creates a video preview for media.
  * **Celery-beat** is a CronJob for managing maintenance celery tasks.
  * **Celery-Flower** is a Celery metrics collector.
  * **Celery-regula** (optional) processes document analysis tasks.
* **Redis** is a message broker and result backend for Celery.
* **RabbitMQ** (optional) can be used as a message broker for Celery instead of Redis.
* **Nginx** serves static media files for external HTTP(s) requests.
* **1:N** (optional) processes the Blacklist analysis.
* **Statistic** (optional) provides [statistics](https://doc.ozforensics.com/oz-knowledge/guides/user-guide/oz-webui/webui-statistics)' collection for Web UI.
* **Web UI** provides the [web interface](https://doc.ozforensics.com/oz-knowledge/guides/user-guide/oz-webui).

**BIO-Updater** checks for models updates and downloads new models.

**Oz BIO (TFSS)** runs TensorFlow with AI models and makes decisions for incoming media.

{% hint style="danger" %}
The **BIO-Updater** and **BIO** components require access to the following external resources:

* <https://api.cryptlex.com/>
* [https://\*.infra.ozforensics.ai/](https://ozforensics.atlassian.net/wiki/spaces/TECHNICALS/pages/808976400/Deployment+architecture)
* [https://\*.s3.amazonaws.com/](https://ozforensics.atlassian.net/wiki/spaces/TECHNICALS/pages/808976400/Deployment+architecture)
* <https://*.s3-accelerate.amazonaws.com/>
  {% endhint %}

## Deployment Scenarios

The deployment scenario depends on the workload you expect.

<table data-full-width="true"><thead><tr><th></th><th>Small Business or PoC</th><th>Medium Load</th><th>High Load</th></tr></thead><tbody><tr><td>Use cases</td><td><ul><li>Testing/Development purposes</li><li>Small installations with low number of APM</li></ul></td><td><ul><li>Typical usage with moderate load</li></ul></td><td><ul><li>High load with HA and autoscaling</li><li>Usage with cloud provider</li></ul></td></tr><tr><td><a data-footnote-ref href="#user-content-fn-1">Expected workload</a></td><td><ul><li>~<a data-footnote-ref href="#user-content-fn-2">6-8</a> APM</li><li>~<a data-footnote-ref href="#user-content-fn-2">250 000</a> analyses per month</li></ul></td><td><ul><li>~<a data-footnote-ref href="#user-content-fn-3">40-60</a> APM</li><li><a data-footnote-ref href="#user-content-fn-5">500 000 to 1 000 000</a> analyses per month</li></ul></td><td><p></p><ul><li><a data-footnote-ref href="#user-content-fn-4">60+</a> APM</li><li><a data-footnote-ref href="#user-content-fn-4">1 000 000+</a> analyses per month</li></ul></td></tr><tr><td>Environment</td><td>Docker</td><td>Docker</td><td>Kubernetes</td></tr><tr><td>HA</td><td>No </td><td>Partially</td><td>Yes</td></tr><tr><td>Pros</td><td><ul><li>Requires a minimal amount of computing resources</li><li>Low complexity, so no high-qualified engineers are needed on-site</li><li>Easy to manage and support</li></ul></td><td><ul><li>Partially supports HA</li><li>Can be scaled up to support higher workload</li></ul></td><td><ul><li>HA and autoscaling</li><li>Observability and manageability</li><li>Allows high workload and can be scaled up</li></ul></td></tr><tr><td>Cons</td><td><ul><li>Suitable only for low loads, no high APM</li><li>No scaling and high-availability</li></ul></td><td><ul><li>API HA requires precise balancing</li><li>Higher staff qualification requirements</li></ul></td><td><ul><li>High staff qualification requirements</li><li>Additional infrastructure requirements</li></ul></td></tr><tr><td>External resource requirements</td><td><a data-footnote-ref href="#user-content-fn-6">PostgreSQL</a></td><td><ul><li>PostgreSQL</li><li><a data-footnote-ref href="#user-content-fn-7">Shared volume (R/W)</a></li></ul></td><td><ul><li><p>For Kubernetes deployments:</p><ul><li>K8s v1.25+</li><li>ingress-nginx</li><li>clusterIssuer</li><li>kube-metrics</li><li>Prometheus</li><li>clusterAutoscaler</li></ul></li><li>PostgreSQL</li><li><a data-footnote-ref href="#user-content-fn-7">Shared volume (R/W)</a></li><li><a data-footnote-ref href="#user-content-fn-8">Redis</a></li></ul></td></tr></tbody></table>

Autoscaling is implemented on the basis of ClusterAutoscaler and must be supported by your infrastructure.

### Small business or PoC <a href="#small-business-poc" id="small-business-poc"></a>

Please find the installation guide here: [Docker](https://doc.ozforensics.com/oz-knowledge/guides/administrator-guide/installation-in-docker).

* Type of containerization: Docker,
* Type of installation: Docker compose,
* Autoscaling/HA: none.

#### Requirements

**Software**&#x20;

* Docker 19.03+,
* Podman 4.4+ (for API 5 only),
* Python 3.4+.

**Storage**

* Depends on media quality, the type and number of analyses, and the required archive depth.
* May be calculated as: \[average media size] \* 2 \* \[analyses per day] \* \[archive depth in days]. Please refer to [this article](https://doc.ozforensics.com/oz-knowledge/other/media-file-size-overview) for media size reference.
* Each analysis request performs read and write operations on the storage. Any additional latency in these operations will impact the analysis time.

**Staff qualification**:

* Basic knowledge of Linux and Docker.

#### Deployment

**Two nodes:**

<figure><img src="https://2532558063-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5g6dgsxRbyrCvB0uAf8f%2Fuploads%2FaeuxEZ3qO7HcwOxuNlwT%2Fpoc%202%20nodes.png?alt=media&#x26;token=0047a4aa-4a2f-447e-9f70-2e7706ede47d" alt=""><figcaption></figcaption></figure>

Resources:

* 2 nodes,
* 16 CPU/32 RAM for the first node; 8 CPU/16 RAM for the second node.

### Medium Load

Please find the installation guide here: [Docker](https://doc.ozforensics.com/oz-knowledge/guides/administrator-guide/installation-in-docker).

* Type of containerization: Docker (Podman is supported for API 5 only),
* Type of installation: Docker compose,
* Autoscaling/HA: manual scaling; HA is partially supported.

#### Requirements

**Computational resources**

Depending on load, you can change the number of nodes. However, for 5+ nodes, we recommend that you proceed to the High Load section.

* From 2 to 4 Docker nodes (see [schemes](#deployment-1)):
  * 2 Nodes:
    * 24 CPU/32 RAM per node.
  * 3 Nodes:
    * 16 CPU/24 RAM per node.
  * 4 Nodes:
    * 8 CPU/16 RAM for two nodes (each),
    * 16 CPU/24 RAM for two nodes (each).

We recommend using external self-managed PostgreSQL database and NFS share.

**Software**&#x20;

* Docker 19.03+,
* Podman 4.4+ (for API 5 only),
* Python 3.4+.

**Storage**

* Depends on media quality, the type and number of analyses, and the required archive depth.
* May be calculated as: \[average media size] \* 2 \* \[analyses per day] \* \[archive depth in days]. Please refer to [this article](https://doc.ozforensics.com/oz-knowledge/other/media-file-size-overview) for media size reference.
* Each analysis request performs read and write operations on the storage. Any additional latency in these operations will impact the analysis time.

**Staff qualification**:

* Advanced knowledge of Linux, Docker, and Postgres.

#### Deployment

**Two nodes:**

<figure><img src="https://2532558063-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5g6dgsxRbyrCvB0uAf8f%2Fuploads%2FzkkB1h2sDHT6t4tFAkoH%2Fmedium%202%20nodes.png?alt=media&#x26;token=8f314bab-5a54-4f2d-bfc5-d51eb3943027" alt=""><figcaption></figcaption></figure>

**Three nodes:**

<figure><img src="https://2532558063-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5g6dgsxRbyrCvB0uAf8f%2Fuploads%2F3TJOHzzu8M5aujlcPXLJ%2Fmedium%203%20nodes.png?alt=media&#x26;token=666488cc-4dcc-47a0-b8b2-965323436ffe" alt=""><figcaption></figcaption></figure>

**Four nodes:**

<figure><img src="https://2532558063-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5g6dgsxRbyrCvB0uAf8f%2Fuploads%2FvHeyhHkSZP8kICxi7EGw%2Fmedium%204%20nodes.png?alt=media&#x26;token=cdb33a6a-509a-447e-94a2-b1b18cbcf39f" alt=""><figcaption></figcaption></figure>

### High Load

Please find the installation guide here: [Kubernetes](https://doc.ozforensics.com/oz-knowledge/guides/administrator-guide/installation-in-kubernetes).

* Type of containerization: Docker containers with Kubernetes orchestration,
* Type of installation: Helm charts,
* Autoscaling/HA: supports autoscaling; HA for most components.

#### Requirements

**Computational resources**

3-4 nodes. Depending on load, you can change the number of nodes.

* 16 CPU/32 RAM Nodes for the BIO pods,
* 8+ CPU/16+ RAM Nodes for all other workload.

We recommend using external self-managed PostgreSQL database.

{% hint style="danger" %}
Requires RWX (ReadWriteMany) StorageClass or NFS share.
{% endhint %}

**Software**&#x20;

* Docker 19.03+,
* Python 3.4+.

**Storage**

* Depends on media quality, the type and number of analyses, and the required archive depth.
* May be calculated as: \[average media size] \* 2 \* \[analyses per day] \* \[archive depth in days]. Please refer to [this article](https://doc.ozforensics.com/oz-knowledge/other/media-file-size-overview) for media size reference.
* Each analysis request performs read and write operations on the storage. Any additional latency in these operations will impact the analysis time.

**Staff qualification**:

* Advanced knowledge of Linux, Docker, Kubernetes, and Postgres.

#### Deployment Scheme

<figure><img src="https://2532558063-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F5g6dgsxRbyrCvB0uAf8f%2Fuploads%2FjLlkxPEVbPD0qgqVQO1J%2Fhigh.png?alt=media&#x26;token=7886169d-6523-4821-832b-9668784e79f5" alt=""><figcaption></figcaption></figure>

[^1]: Values are calculated for specific amount of Nodes, recommended for each deployment type.

[^2]: For a single Node. Results are calculated using ShotSet with lossless frame, and may vary depending on incoming mime type (image, video, zip), media quality, length, size, and image contents

[^3]: For 4 nodes. Results are calculated using ShotSet with lossless frame, and may vary depending on incoming mime type (image, video, zip), media quality, length, size, and image contents

[^4]: Results are calculated using ShotSet with lossless frame, and may vary depending on incoming mime type (image, video, zip), media quality, length, size, and image contents

[^5]: 1 000 000 is for 4 nodes. Results are calculated using ShotSet with lossless frame, and may vary depending on incoming mime type (image, video, zip), media quality, length, size, and image contents

[^6]: External PostgreSQL is not required for PoC or test deployments

[^7]: Required in case of 2+ nodes with API installed (4+ nodes type of deployment) or in case of deployment in Kubernetes. Can be NFS, EFS, CephFS, Longhorn, or any type of FS with RWX support between many nodes

[^8]: Optional. Only standalone Redis is supported
