# Model Deployment Flow

## Prerequisites

- Prior to running Neutron Converter, the model needs to go through quantization step.
    - Generic conversion between dialects and quantization steps illustrated here: *
      *{doc}`Conversion & Quantization <../../../../convQuant/index>`**.
    - If a TFLite FP32 model is obtained, it can be consumed by *
      *{doc}`tflite-profiler <../../softwareTools/TFLiteProfiler>`** and *
      *{doc}`tflite-quantizer <../../softwareTools/TFLiteQuantizer>`** in order to obtain a TFLite
      quantized model:
        - A profile file must be generated, using profiling tool *
          *{doc}`tflite-profiler <../../softwareTools/TFLiteProfiler>`**, based on the model and
          representative dataset.
        - Quantization of the model using profiled guided quantization using *
          *{doc}`tflite-quantizer <../../softwareTools/TFLiteQuantizer>`** tool.

- Running **{doc}`neutron-converter <../../softwareTools/NeutronConverter>`** requires the following
  minimum set of parameters:
    - Target SoC
    - Input model (resulted after quantized steps)

&nbsp;

## Tools description

## {doc}`tflite-profiler <../../softwareTools/TFLiteProfiler>`

This tool profiles a FLOAT (FP16/FP32) model by computing the dynamic range \[min, max\] for each FLOAT tensor and
writing the information to a file called **profile** with the following format:

```
<tensor_index_1>,<min_value_1>,<max_value_1>,<freq_tensor_1_bin_1>,<freq_tensor_1_bin_2>,..,<freq_tensor_1_bin_n>,
<tensor_index_2>,<min_value_2>,<max_value_2>,<freq_tensor_2_bin_1>,<freq_tensor_2_bin_2>,..,<freq_tensor_2_bin_n>,
...
```

* * *

## {doc}`tflite-quantizer <../../softwareTools/TFLiteQuantizer>`

This tool quantizes a FLOAT (FP16/FP32) TensorFlow Lite model using **profiling guided quantization**.

The quantizer uses dynamic range information from a profile file (generated by tflite-profiler) to determine optimal
quantization parameters for each tensor.

* * *

## {doc}`neutron-converter <../../softwareTools/NeutronConverter>`

The **neutron-converter** tool is a CLI (Command Line Interface) tool used to convert models in TFLite format for
execution on the Neutron NPU.

This tool has the following traits:

- Consumes a standard TFLite model containing standard TFLite operators.
- Produces a custom TFLite model containing both standard and custom TFLite operators:
    - Standard operators that are supported by NPU are extracted together and mapped to one or multiple `NeutronGraph`
      custom operators in the converted model to be executed by the NPU.
    - Standard operators that are NOT supported by NPU are left unmodified in the converted model to be executed by the
      CPU.
- Maps mathematical primitives from the TFLite graph (operators) to execution primitives from the Neutron Library (
  kernels).
    - To be noted that this mapping is NOT necessarily 1:1, it can be N:1 and in some special cases also N:M.
- The converted model is then consumed by the Neutron Runtime consisting of 3 components:
    - **TFLite Runtime** - Runs on CPU with a registered mechanism to dispatch all the `NeutronGraph` custom operators
      to the **NeutronDriver** component.
    - **NeutronDriver** - Acts as an interface between CPU and NPU and communicates directly with the **NeutronFirmware
      **.
    - **NeutronFirmware** - Drives directly the execution of the NPU hardware.

More information can be found in {doc}`Neutron NPU Software Tools <../../softwareTools/index>`.