# Model Deployment Flow ## Prerequisites - Prior to running Neutron Converter, the model needs to go through quantization step. - Generic conversion between dialects and quantization steps illustrated here: * *{doc}`Conversion & Quantization <../../../../convQuant/index>`**. - If a TFLite FP32 model is obtained, it can be consumed by * *{doc}`tflite-profiler <../../softwareTools/TFLiteProfiler>`** and * *{doc}`tflite-quantizer <../../softwareTools/TFLiteQuantizer>`** in order to obtain a TFLite quantized model: - A profile file must be generated, using profiling tool * *{doc}`tflite-profiler <../../softwareTools/TFLiteProfiler>`**, based on the model and representative dataset. - Quantization of the model using profiled guided quantization using * *{doc}`tflite-quantizer <../../softwareTools/TFLiteQuantizer>`** tool. - Running **{doc}`neutron-converter <../../softwareTools/NeutronConverter>`** requires the following minimum set of parameters: - Target SoC - Input model (resulted after quantized steps)   ## Tools description ## {doc}`tflite-profiler <../../softwareTools/TFLiteProfiler>` This tool profiles a FLOAT (FP16/FP32) model by computing the dynamic range \[min, max\] for each FLOAT tensor and writing the information to a file called **profile** with the following format: ``` ,,,,,..,, ,,,,,..,, ... ``` * * * ## {doc}`tflite-quantizer <../../softwareTools/TFLiteQuantizer>` This tool quantizes a FLOAT (FP16/FP32) TensorFlow Lite model using **profiling guided quantization**. The quantizer uses dynamic range information from a profile file (generated by tflite-profiler) to determine optimal quantization parameters for each tensor. * * * ## {doc}`neutron-converter <../../softwareTools/NeutronConverter>` The **neutron-converter** tool is a CLI (Command Line Interface) tool used to convert models in TFLite format for execution on the Neutron NPU. This tool has the following traits: - Consumes a standard TFLite model containing standard TFLite operators. - Produces a custom TFLite model containing both standard and custom TFLite operators: - Standard operators that are supported by NPU are extracted together and mapped to one or multiple `NeutronGraph` custom operators in the converted model to be executed by the NPU. - Standard operators that are NOT supported by NPU are left unmodified in the converted model to be executed by the CPU. - Maps mathematical primitives from the TFLite graph (operators) to execution primitives from the Neutron Library ( kernels). - To be noted that this mapping is NOT necessarily 1:1, it can be N:1 and in some special cases also N:M. - The converted model is then consumed by the Neutron Runtime consisting of 3 components: - **TFLite Runtime** - Runs on CPU with a registered mechanism to dispatch all the `NeutronGraph` custom operators to the **NeutronDriver** component. - **NeutronDriver** - Acts as an interface between CPU and NPU and communicates directly with the **NeutronFirmware **. - **NeutronFirmware** - Drives directly the execution of the NPU hardware. More information can be found in {doc}`Neutron NPU Software Tools <../../softwareTools/index>`.