# eIQ Neutron-S

The targeted **TF Lite** model is converted using the **{doc}`Neutron Converter <../../softwareTools/NeutronConverter>`
** tool on the host PC in order to obtain a custom **TFLite** model.

* * *

**Flow 1** is applicable to models which are partially and fully converted, inference engine is used, CPU fallback is
possible for operations not supported by eIQ Neutron-S if model is partially converted.

- The **converted TFLite** model is passed by the application further to an inference engine which could be an
  open-source (for example TFLite) or 3rd party.

```{figure} images/eiq_neutron_s_inference_flow1.png
:name: fig-neutron-s-inference-flow1
:width: 776px

eIQ Neutron-S inference flow using inference engine, for partially and fully converted models
```

- The inference engine will execute the custom **TFLite** model and when it finds **NeutronGraph** containers it will
  forward to the Neutron Driver the associated microcode, weights and kernels buffers from DDR (passed by reference).
  The inference engine will also provide the Neutron Driver with the input/output data buffers (passed by reference).
  The input data should be available in the input DDR buffer when starting the execution of the microcode by the
  Firmware. In return, upon completion, the output data will be written in the output DDR buffer by the Firmware.
- The part of the inference engine that forwards **NeutronGraph** containers to the Neutron Driver can be either a
  special backend implementation or a standard mechanism such as the one provided by the TFLite inference engine called
  “custom operator registration”.
- The Neutron Driver will subsequently forward to the Neutron Firmware the microcode, weights, kernels, input and output
  buffers (passed by reference). The Neutron Driver can also allocate if needed a scratch buffer used by the Neutron
  subsystem for intermediate computation results which cannot be kept in the internal TCM memory due to memory
  constraints and thus must be evicted in the DDR. Similarly, the Neutron Driver can allocate additional buffers used by
  the Firmware to dump debug or trace information.

* * *

**Flow 2** is applicable to models **fully converted**, no inference engine is used, no CPU fallback is possible.

- This flow can be used when the entire graph is supported by Neutron (the entire graph collapses to a single
  NeutronGraph container) and hence no CPU/GPU fallback is required. In this case the inference engine can be removed
  altogether because it provides no functionality and instead only adds unnecessary overhead (both latency and memory
  footprint).

```{figure} images/eiq_neutron_s_inference_flow2.png
:name: fig-neutron-s-inference-flow2
:width: 776px

eIQ Neutron-S inference flow without inference engine, for fully converted models
```

- The **{doc}`Neutron Converter <../../softwareTools/NeutronConverter>`** will provide directly the
  microcode/weights/kernels as raw buffers to the application.
- The application will provide directly the microcode, weights, kernels and input/output buffers to the Neutron Driver.
  This will improve the latency of the model execution because the extra overhead of the inference engine is removed.