# Neutron Converter

## Description

The **neutron-converter** tool is a CLI (Command Line Interface) tool used to convert models in TFLite format for
execution on the Neutron NPU.

This tool has the following traits:

- Consumes a standard TFLite model containing standard TFLite operators.
- Produces a custom TFLite model containing both standard and custom TFLite operators:
    - Standard operators that are supported by NPU are extracted together and mapped to one or multiple `NeutronGraph`
      custom operators in the converted model to be executed by the NPU.
    - Standard operators that are NOT supported by NPU are left unmodified in the converted model to be executed by the
      CPU.
- Maps mathematical primitives from the TFLite graph (operators) to execution primitives from the Neutron Library (
  kernels).
    - To be noted that this mapping is NOT necessarily 1:1, it can be N:1 and in some special cases also N:M.
- The converted model is then consumed by the Neutron Runtime consisting of 3 components:
    - **TFLite Runtime** - Runs on CPU with a registered mechanism to dispatch all the `NeutronGraph` custom operators
      to the **NeutronDriver** component.
    - **NeutronDriver** - Acts as an interface between CPU and NPU and communicates directly with the **NeutronFirmware
      **.
    - **NeutronFirmware** - Drives directly the execution of the NPU hardware.

This tool has the following conversion stages:

1. **Optimize**
    - The graph is transformed in a mathematical equivalent way to make it more compatible with the Neutron hardware
      constraints.
    - The graph transformations are performed with elementary transformations called **passes**.
2. **Extract**
    - In this stage the graph operators are extracted and mapped to custom operators in a two level hierarchy:
        - Multiple standard operators that can be executed by a **single kernel** are mapped to a `NeutronOperator`
          container.
        - Multiple adjacent `NeutronOperator` containers are mapped to a `NeutronGraph` container.
    - Both the `NeutronOperator` and the `NeutronGraph` containers are expressed in the graph as custom TFLite
      operators.
    - The `NeutronGraph` container represents an NPU workload that is executed atomically (in one shot) by the NPU
      without CPU intervention.
    - The tool is greedy (by default) by mapping as much computation as possible using the smallest number of
      `NeutronGraph` containers in order to reduce the communication overhead between CPU and NPU.
3. **Generate**
    - Each `NeutronGraph` after extraction is passed separately and independently to this stage which includes the
      following steps:
        - **Tiling** - Divide the compute jobs in smaller jobs to fit the internal TCM memory of the subsystem (eIQ
          Neutron-S only).
        - **Scheduling** - Decide the order in which the compute jobs and the data transfers should occur at runtime.
        - **Memory allocation** - Plan the addresses in memory where each buffer will be located at runtime.
        - **Code generation** - Generate the binary artifacts that will be consumed at runtime: microcode, weights,
          kernels.
    - At the end of this stage, each `NeutronGraph` container will be modified by appending some extra operands:
        - 3 constant input tensors, following the original ones, in this order:
            - **Microcode** - A sequence of commands (microinstructions) including compute and data transfer jobs for
              the **NeutronFirmware** to execute. This is expressed as a raw binary buffer.
            - **Weigthts** - All the model weights rearranged and pre-processed according to the needs of the kernels
              and concatenated together into a single raw binary buffer.
            - **Kernels** - The binary code (text) for all the kernels that are needed for the execution of the
              associated `NeutronGraph` container. This code is fetched (loaded) at runtime by the **NeutronFirmware**.
              This is used only for eIQ Neutron-S targets. For eIQ Neutron targets a dummy one byte will be stored in
              this tensor (constant tensors cannot be empty in TFLite).
        - 1 variable output tensor, following the original ones:
            - **Scratch** - This is a variable tensor where intermediate results can be stored during the execution of
              the associated `NeutronGraph`.

## Minimum System Requirements

Because **neutron-converter** is using heavy algorithms (such as constrained programming) which might result in
significant conversion times for some models, for a decent user experience we recommend the following minimum system
configuration:

- CPU Cores: 2GHz or faster with 8 or more cores
- RAM: 4GB or more

## Disclaimers

Please note that for eIQ Neutron-S targets, the converter uses constrained programming solvers which are NOT
deterministic.
This means that subsequent conversions of the same model provide slightly different microcode, especially in terms of
the TCM addresses.
This is because the solver is multithreaded and multiple threads compete to find the optimal solution.
Since multiple optimal solutions are possible, these different solutions (threads) will race towards the end, resulting
possibly in a different solution winning every time.
But the main point is that all these different solutions are equally correct and optimal and the differences should not
be relevant.
There are some methods to force determinism but those methods result in increased conversion time which generally is not
acceptable.

**Notes**:

- The eIQ Neutron targets (e.g. iMXRT700) and eIQ Neutron-S targets (e.g. iMX95, iMX943) use completely different flows
  in the tool resulting in different user experiences.
- When converting the **same model** for **different targets**, the conversion time can vary significantly:
    - For example, when converting the same model for iMX95 (eIQ Neutron-S with 4 cores) vs iMX943 (eIQ Neutron-S with 1
      core) the conversion time for iMX943 will be larger.
      This is because iMX943 has smaller internal memory (TCM) than iMX95 (4x smaller) resulting in a larger number of
      smaller tiles.
      Because there are more tiles for iMX943 than iMX95 the scheduler will have more tiles to work with (more degrees
      of freedom) resulting in a larger conversion time.

## Command line options

The **neutron-converter** has the following options:

```
neutron-converter --input <input_model_path> --target <neutron_target>
```

where:

- `input` - The path of the standard TensorFlowLite model (mandatory).
- `target` - The name of the Neutron target for which the model is converted (optional). Default is `mcxn94x`.

You can check all the available targets that the tool provides using the `show-targets` option:

```
neutron-converter --show-targets
```

Converter options:

- `input` - Input TensorFlowLite file path.
- `output` - Output TensorFlowLite file path. Default is input file path with `_converted` suffix.
- `verbose` - Option to dump extra information to the console.
- `dump-header-file-input` - Option to dump the input TensorFlowLite model as a header file. The header file name will
  be the same as the input model but with the extension `.h`.
- `dump-header-file-output` - Option to dump the output TensorFlowLite model as a header file. The header file name will
  be the same as the output model but with the extension `.h`.
- `return-after-optimize` - Option to convert the model partially and return it directly after the optimization stage.
- `return-after-extract` - Option to convert the model partially and return it directly after the extraction stage.
- `run-after-import` - Option to check the model after importing by dry running it with the CPU standard TFLite
  interpreter embedded in the converter using dummy input data. This option is useful to sanity check that the model is
  not corrupt.
- `run-after-optimize` - Option to check the model after optimization by dry running it with the CPU standard TFLite
  interpreter embedded in the converter using dummy input data. This option is useful to sanity check that the model is
  not corrupt. Note that in some special cases some optimizations will result in operators that are not supported by the
  TFLite runtime. For example, sometimes Softmax is changed from INT8 output to UINT8 output which although is
  numerically correct, it lacks support in the TFLite runtime causing an error when the model runs.
- `run-after-extract` - Option to check the model after extraction by dry running it with the CPU standard TFLite
  interpreter embedded in the converter using dummy input data. This option is useful to sanity check that the model is
  not corrupt. Note that the NeutronGraph custom operators resulted after extraction will be treated as NOP.
- `run-after-generate` - Option to check the model after generation by dry running it with the CPU standard TFLite
  interpreter embedded in the converter using dummy input data. This option is useful to sanity check that the model is
  not corrupt. Note that the NeutronGraph custom operators resulted after extraction will be treated as NOP.
- `dump-after-import` - Option to dump the model after importing to the given path (if not empty). If the given path is
  the `console` keyword then the model is dumped to the console in human-readable text format.
- `dump-after-optimize` - Option to dump the model after optimization to the given path (if not empty). If the given
  path is the `console` keyword then the model is dumped to the console in human-readable text format.
- `dump-after-extract` - Option to dump the model after extraction to the given path (if not empty). If the given path
  is the `console` keyword then the model is dumped to the console in human-readable text format.
- `dump-after-generate` - Option to dump the model after generation to the given path (if not empty). If the given path
  is the `console` keyword then the model is dumped to the console in human-readable text format.
- `dump-neutron-ir` - Option to dump the NeutronIR in textual form in the console (only for eIQ Neutron-S).
- `dump-neutron-ir-file` - Option to dump the NeutronIR in textual form in a separate file. (only for eIQ Neutron-S).
  Each file will have a suffix given by the index (location) of the NeutronGraph operator in the converted graph.
- `dump-neutron-ir-final` - Option to dump the NeutronIR in textual form in the console in its final form (only for eIQ
  Neutron-S).
- `dump-neutron-ir-final-file` - Option to dump the NeutronIR in textual form in a separate file in its final form. (
  only for eIQ Neutron-S). Each file will have a suffix given by the index (location) of the NeutronGraph operator in
  the converted graph.
- `dump-neutron-ir-elide-elements-attrs` - Option to print large constants of NeutronIR in a redacted form, making the
  IR easier to scan (only for eIQ Neutron-S). Note: The IR generated with this option is not parsable.
- `dump-gfs` - Option to dump the global format selection results to a CSV file (only for eIQ Neutron-S).
- `dump-tile-ir` - Option to dump the TileIR information to CSV files (only for eIQ Neutron-S).
- `dump-microcode` - Option to dump the microcode for each NeutronGraph in textual form in the console.
- `dump-weights` - Option to dump the weights for each NeutronGraph in hexadecimal form in the console.
- `dump-kernels` - Option to dump the kernels for each NeutronGraph in hexadecimal form in the console.
- `dump-microcode-file` - Option to dump the microcode for each NeutronGraph in a separate file. Each file will have a
  suffix given by the index (location) of the NeutronGraph operator in the converted graph.
- `dump-weights-file` - Option to dump the weights for each NeutronGraph in a separate file. Each file will have a
  suffix given by the index (location) of the NeutronGraph operator in the converted graph.
- `dump-kernels-file` - Option to dump the kernel binaries for each NeutronGraph in a separate file. Each file will have
  a suffix given by the index (location) of the NeutronGraph operator in the converted graph.
- `dump-kernel-names-file` - Option to dump the kernel names for each NeutronGraph in a separate file. Each file will
  have a suffix given by the index (location) of the NeutronGraph operator in the converted graph.
- `dump-kernel-selection-code` - Option to dump the code used by the eIQ Neutron application to select the subset of
  kernels that should be included in the program binary. The code is dumped as self-contained C code. This code is
  useful to optimize the program size for embedded applications associated to memory constrained devices such as eIQ
  Neutron targets. The generated code ensures that only the kernels required by the converted model are linked into the
  binary. Multiple such files can be merged outside of neutron-converter in order to merge the kernel requirements for
  multiple models using the `merge_kernel_selection_code.py` script delivered with the software.
- `dump-microcode-format` - Option to choose the microcode dump format: `dense` for a compact form and `sparse` for a
  more readable form.
- `dump-bytecode` - Option to dump the sequencer bytecode in the console. Relevant only when using the sequencer mode
  for eIQ Neutron.
- `dump-statistics` - Option to dump the summary of cycle estimation statistics in the console.
- `dump-statistics-file` - Option to dump the summary of cycle estimation statistics to text file.
- `keep-metadata` - Option to keep all the model metadata including the metadata from the original model and the
  metadata generated during conversion. This option is false by default since normally the model metadata is not needed
  by the runtime.
- `delete-tensor-names` - Option to delete all the model tensor names (set the tensor names to empty strings). Useful to
  reduce the model size for small models where the memory overhead of meta strings is relatively large.
- `keep-graphs` - Option to keep all the intermediate graphs used to convert and map the original operators to Neutron.
  The intermediate graphs will be kept as separate graphs in the converted model. This option is useful for debugging
  since it shows how operators are transformed and mapped to Neutron. This option is false by default since the extra
  graphs will dramatically increase the model size (up to 2x).
- `dump-graphs` - Option to dump all the intermediate graphs used to convert and map the original operators to Neutron.
  The intermediate graphs will be dumped as separate files in the same folder as the converted model. This option is
  useful for debugging since it shows how operators are transformed and mapped to Neutron.
- `merge-neutron-graphs` - Option to merge all the adjacent NeutronGraphs with one NeutronOperator into a single
  NeutronGraph with multiple NeutronOperators. This feature is enabled by default since it is more efficient to dispatch
  more computation at once to Neutron. Disabling this feature is useful to debug each NeutronOperator separately. Note
  that this debug option is highly intrusive because it greatly affects how the computation and data transfers are
  scheduled but also how the memory planning is performed. This is especially true for eIQ Neutron-S targets.
- `flatten-neutron-graphs` - Option to flatten the structure of the NeutronGraphs by replacing the inner
  NeutronOperators with the associated TFLite operators. This option is useful for debugging since it shows more clearly
  the subgraphs from the TFLite model which are mapped to NeutronGraphs. This option affects how the intermediate graphs
  are handled when using the options to keep or dump the intermediate graphs.
- `min-num-ops-per-graph` - The minimum number of NeutronOperators that a NeutronGraph must contain in order to be
  extracted. This rule only applies when the model has more than one NeutronGraphs. This is useful for debugging or
  tuning the performance in cases with small NeutronGraphs (with few NeutronOperators). In such cases the overhead of
  offloading to NPU is comparable to the NPU computation itself and therefore leaving the computation to the CPU (not
  offloading to NPU) might be the better choice. This logic only applies to eIQ Neutron-S targets where the overhead is
  more significant.
- `keep-constant-tensors` - Option to keep all the constant tensors from the graph by exposing them as graph outputs.
- `keep-variable-tensors` - Option to keep all the variable tensors from the graph by exposing them as graph outputs.
- `include-graph-passes` - Graph passes that should ONLY be included for graph optimization. String with comma separated
  names for graph passes or graph pass indices (empty means all included).
- `exclude-graph-passes` - Graph passes that should be excluded from graph optimization. String with comma separated
  names for graph passes or graph pass indices (empty means none excluded).
- `include-operator-types` - Operator types that should ONLY be included for Neutron execution. String with comma
  separated names for operator types or operator type indices (empty means all included). For example use `0,1` or
  `ADD,AVERAGE_POOL_2D` to ONLY specialize the Add and AvgPool operator types. NOTE that the operators are identified
  NOT in the original model but in the intermediate model right before extraction (right after optimization).
- `exclude-operator-types` - Operator types that should be excluded from Neutron execution. String with comma separated
  names for operator types or operator type indices (empty means none excluded). For example use `0,1` or
  `ADD,AVERAGE_POOL_2D` to NOT specialize the Add and AvgPool operator types. NOTE that the operators are identified NOT
  in the original model but in the intermediate model right before extraction (right after optimization).
- `include-operators` - Operator instances that should ONLY be included for Neutron execution. String with comma
  separated names for operator tensor outputs or operator indices (empty means all included). For example use `13,14` or
  `name13,name14` to ONLY specialize the operator instances with locations 13,14 or with output tensors names `name13`,
  `name14`. NOTE that the operators are identified NOT in the original model but in the intermediate model right before
  extraction (right after optimization).
- `exclude-operators` - Operator instances that should be excluded from Neutron execution. String with comma separated
  names for operator tensor outputs or operator indices (empty means none excluded). For example use `13,14` or
  `name13,name14` to NOT specialize the operator instances with locations 13,14 or with output tensors names `name13`,
  `name14`. NOTE that the operators are identified NOT in the original model but in the intermediate model right before
  extraction (right after optimization).
- `include-between-input-tensors` - Tensor instances that delimit the start of the operators that should ONLY be
  included for Neutron execution. String with comma separated names of tensors (empty means all included). For example
  use `1,2` or `input1,input2` to ONLY specialize the operator instances limited by the input tensors with locations 1,2
  or names `input1`,`input2`. NOTE that the tensors are identified NOT in the original model but in the intermediate
  model right before extraction (right after optimization).
- `include-between-output-tensors` - Tensor instances that delimit the end of the operators that should ONLY be included
  for Neutron execution. String with comma separated names of tensors (empty means all included). For example use `1,2`
  or `output1,output2` to ONLY specialize the operator instances limited by the output tensors with locations 1,2 or
  names `output1`,`output2`. NOTE that the tensors are identified NOT in the original model but in the intermediate
  model right before extraction (right after optimization).
- `exclude-between-input-tensors` - Tensor instances that delimit the start of the operators that should be excluded
  from Neutron execution. String with comma separated names of tensors (empty means none excluded). For example use
  `1,2` or `input1,input2` to NOT specialize the operator instances limited by the input tensors with locations 1,2 or
  names `input1`,`input2`. NOTE that the tensors are identified NOT in the original model but in the intermediate model
  right before extraction (right after optimization).
- `exclude-between-output-tensors` - Tensor instances that delimit the end of the operators that should be excluded from
  Neutron execution. String with comma separated names of tensors (empty means none excluded). For example use `1,2` or
  `output1,output2` to NOT specialize the operator instances limited by the output tensors with locations 1,2 or names
  `output1`,`output2`. NOTE that the tensors are identified NOT in the original model but in the intermediate model
  right before extraction (right after optimization).
- `include-kernel-kinds` - Kernel kinds that should ONLY be included for Neutron execution. String with comma separated
  names for kernel kinds or kernel kind indices (empty means all included). For example use `1,7` or
  `Conv2DStandardV1,MaxPool` to ONLY use the Conv2DStandardV1 and MaxPool kernel kinds.
- `exclude-kernel-kinds` - Kernel kinds that should be excluded from Neutron execution. String with comma separated
  names for kernel kinds or kernel kind indices (empty means none excluded). For example use `1,7` or
  `Conv2DStandardV1,MaxPool` to NOT use the Conv2DStandardV1 or MaxPool kernel kinds.
- `include-operator-plugins` - Operator plugins that should ONLY be included for model compilation. String with comma
  separated names for operator plugins.
- `exclude-operator-plugins` - Operator plugins that should be excluded from model compilation. String with comma
  separated names for operator plugins.
- `use-32bit-biases` - Option to generate and use 32bit biases instead of 16bit biases.
- `use-32bit-scales` - Option to generate and use 32bit scales instead of 16bit scales.
- `use-int-float32-acc-bias` - Option to use IntFloat32 Neutron data type for accumulator and bias instead of Int32
  accumulator and 32bit biases.
- `use-weights-compression` - Option to enable weights compression feature.
- `use-sequencer` - Option to use the Neutron sequencer by generating Neutron bytecode. Note that this option cannot be
  used for eIQ Neutron-S targets (with subsystem).
- `convert-inputs-uint8-to-int8` - Option to convert the graph inputs from UINT8 to INT8 when converting UINT8 TFLite
  models. If this option is false then an extra Quantize operator will be used for the inputs.
- `convert-outputs-uint8-to-int8` - Option to convert the graph outputs from UINT8 to INT8 when converting UINT8 TFLite
  models. If this option is false then an extra Quantize operator will be used for the outputs.
- `use-interpolator` - Option to enable the interpolator to map non-linear elementwise operators.
- `dump-interpolator` - Option to dump python scripts for displaying the interpolator fitting curves for each mappable
  subgraph.
- `interpolator-max-error-int8` - Option to specify the maximum acceptable error for interpolator fitting for INT8
  operators. If the fitting error is higher than this value then the operators will NOT be mapped to Neutron.
- `interpolator-max-error-int16` - Option to specify the maximum acceptable error for interpolator fitting for INT16
  operators. If the fitting error is higher than this value then the operators will NOT be mapped to Neutron.
- `interpolator-max-num-coeffs` - Option to specify the maximum number of coefficients used for interpolator fitting.
- `fetch-constants-to-sram` - Fetch constants (weights) from an external memory (external for Neutron, such as FLASH
  memory) into SRAM. This feature is relevant only for eIQ Neutron targets. This feature allows running models which do
  not fit into SRAM by offloading their weights to an external memory. Note that the weights prefetching will be done in
  parallel with the compute: while computing layer N the system will prefetch in parallel the weights for layer N+1.
  This ensures that the latency is optimal. For models that are I/O bound the time for prefetch might exceed the time
  for compute and so some extra penalty might occur. Therefore this feature must used only if needed: if model already
  fits into SRAM then it should be placed entirely into SRAM and used from there without using this feature.
- `optimization-level` - Compilation optimization level. The following optimization levels are available: `OFast`: Uses
  heuristics to solve optimization problems, providing fast conversion with performance that is typically close to
  optimal. `OOpt`: Uses exact methods (e.g., SAT/ILP solvers) to compute optimal schedules and memory allocations, at
  the cost of potentially long conversion times.
- `force-determinism` - Used to force determinism for the solvers involved in the scheduling and memory allocation (only
  for eIQ Neutron-S). Determinism means reproducible behavior for successive conversions of the same model in terms of
  the generated microcode, weights and kernels for each NeutronGraph. This behavior is not implicit in the solvers when
  using multithreading because multiple threads compete to find the optimal solution and due to race conditions
  different solutions can win each time. All the competing solutions are equally optimal. Determinism is achieved by
  disabling the multithreading which results in increased conversion time (usually in the order of 2x to 3x).
- `force-convert-avg-pool-to-depthwise-conv` - Used to force the conversion of an AvgPool with padding into a
  DepthwiseConv2D with fixed weights. Note that this transformation is not mathematical equivalent since the AvgPool
  layer normalizes the data differently around the padding regions. Therefore an error is expected around the feature
  map borders. The converter does not support an AvgPool with padding by default so converting it into a DepthwiseConv2D
  by force is the only option to map to NPU. This option only applies to AvgPool with padding. AvgPool without padding
  is mapped unconditionally to NPU.
- `use-tunneling` - Option to enable the tunneling optimization (only for eIQ Neutron-S). Without tunneling: - Each
  layer is computed fully before computing the next layer (layer-first scheduling of compute). - For layers with a lot
  of data this can result in extra TCM evictions (data spilling): if the results of the current layer do not fit TCM
  they will be pushed to DDR and then fetched later for the next layer. This data bottleneck results in extra data
  transfers between TCM and DDR and hence reduced performance. With tunneling: - Each layer is computed partially for a
  single tile which is then tunneled through the next layers (depth-first scheduling of compute). - A tile is computed
  through multiple layers without TCM evictions (data spilling). Afterwards the next tile is computed for the same
  layers. This approach reduces TCM usage, reduces data spilling and improves performance. This optimization only
  reschedules the tiles in order to tunnel through multiple layers (depth-first).
- `use-profiling` - Option to enable model profiling at runtime. This will resize the NeutronProfile tensor of the
  NeutronGraph to accommodate the profiling data.
- `use-profiling-tick-trace` - Option to enable model profiling at runtime with tick trace. - By default the profiling
  is performed at layer (operator) level. - By enabling this flag the profiling is performed at tick (job) level.
- `use-new-flow-neutron-c` - Option used to enable the new eIQ Neutron flow with better operator support but still
  experimental.

Other options:

- `dump-target-archive` - Dump additional target specific files into a separate folder.

Note that for the above options:

- The options `include-between-input-tensors` and `include-between-output-tensors` must be used together.
- The options `exclude-between-input-tensors` and `exclude-between-output-tensors` must be used together.
- All the options `include-*` and `exclude-*` can be used and combined together:
    - The options `include-*` have additive effect and are considered first.
    - The options `exclude-*` have subtractive effect and are considered second after the additions of the `include-*`
      options.
- Operators/tensors are identified NOT in the original model but in the model right before the extraction (after the
  optimization). That model can be dumped using the option `--dump-after-optimize.`
- Tensors are identified either by their index (e.g. the `location` attribute that you can view in the Netron
  visualizer) or their name (the `name` attribute that you can view in the Netron visualizer).
    - To be noted that tensor names are not guaranteed to be unique so in order to avoid any potential ambiguity it is
      recommended to use the index and not the name because the tensor index is unique.
- Operators are identified either by their index (e.g. the `location` attribute that you can view in the Netron
  visualizer) or the name of the output tensor (since operators do not have a standalone `name` attribute).
    - If the operator has multiple output tensors then the name of the first output tensor will be considered as
      operator name identifier.

You can check all the operator type identifiers (used by the `include-operator-types` and `exclude-operator-types`
options) by using the `show-operator-types` option:

```
neutron-converter --show-operator-types
```

You can check all the kernel kind identifiers (used by the `include-kernel-kinds` and `exclude-kernel-kinds` options)
for the specified target by using the `show-kernel-kinds` option:

```
neutron-converter --target <neutron_target> --show-kernel-kinds
```

You can check all the operator plugins (used by the `include-operator-plugins` and `exclude-operator-plugins` options)
for the specified target by using the `show-operator-plugins` option:

```
neutron-converter --target <neutron_target> --show-operator-plugins
```

You can check all the command line options using the `help` option:

```
neutron-converter --help
```

You can check the version of the tool using the `version` option:

```
neutron-converter --version
```

You can check cycle calculator and conversion statistics using `dump-statistics` or `dump-statistics-file` flags.

- To view statistics in the console, one separate section for each NeutronGraph:

```
neutron-converter --input <model> --target <target> --dump-statistics
```

- To view statistics in a separate text file, one for each NeutronGraph:

```
neutron-converter --input <model> --target <target> --dump-statistics-file
```

## Program memory optimization for eIQ Neutron

For **eIQ Neutron-S** targets (such as i.MX95), each converted model contains only the kernel code specific to that
model within its `NeutronGraph` container.
The kernel code is fetched dynamically at runtime from DDR to the I-TCM of the subsystem as the model executes.
Therefore, for eIQ Neutron-S targets, only the kernels required by each specific model are present in memory, ensuring
efficient program memory usage with no waste.

For **eIQ Neutron** targets (such as i.MXRT700), the converted models do not contain the kernel code.
Instead, all existing kernels from the `NeutronLibrary` are delivered in the `NeutronFirmware`, which is linked
separately at the application level.
Because the kernels are invoked dynamically through a symbol table at runtime, the linker cannot determine which kernels
are unused and strip their code from the binary.
This means the application always links all kernels from the `NeutronLibrary`, whether they are used by the models or
not, resulting in program memory waste.
For memory-constrained devices such as MCUs, this memory waste must be avoided.
The neutron-converter provides a mechanism to trim the list of kernels linked by the application by allowing you to
overwrite the symbol table associated with one or multiple converted models.

The steps to overwrite the symbol table are:

1) For each model to be converted, use the neutron-converter option `dump-kernel-selection-code` to dump the definition
   of the kernel symbol table `neutronKernels` for that model as a C source file.
   ```
   neutron-converter --input=<model_name> --target=imxrt700 ... --dump-kernel-selection-code
   ```
   The option will dump the C source code in one or multiple files (one for each NeutronGraph) with a specific name such
   as `<model_name>_<graph_name>_kernel_selection.c`.
2) The symbol tables for multiple models can be merged outside neutron-converter using the Python script
   `merge_kernel_selection_code.py` like this:
   ```
   python3 merge_kernel_selection_code.py -i file1.c file2.c ... -o file_merged.c
   ```
3) Copy the merged C source file `file_merged.c` into the MCU application and compile with the rest of the code.

The mechanism to overwrite the symbol table works as follows:

- The original symbol table `neutronKernels` is defined as a weak symbol inside the `NeutronFirmware` image.
- The new symbol table defined in the generated source file will overwrite the original one.
- This forces the application to link only the symbols contained in the new symbol table (not the original).
- As a consequence, only the kernels needed by the specific models for which the symbol table was generated will be
  linked into the application.
- This results in program (text) memory optimization.

## Optimization levels

Optimization levels are used to provide different trade-offs between conversion time and performance.
Each level uses a different strategy to solve the optimization problems involved in the model conversion, such as
scheduling computations and data transfers or allocating memory.
The converter provides the following two optimization levels:

- `OFast (default)` - OFast uses fast, custom heuristics-based solvers designed specifically for the optimization
  problems on the Neutron architecture.
- `OOpt` - In this mode, scheduling and memory allocation are formulated as constraint programming problems and tackled
  using third-party ILP/SAT solvers.

**Key characteristics of the fast mode**:

- Fast conversion times
- Scales well with larger model sizes
- Provides optimal or near‑optimal solutions for the vast majority of tested models

This mode offers the right balance between conversion speed and performance for most use cases.

**Key characteristics of the optimal mode**:

- Guarantees optimal solutions to the internal optimization problems
- Achieves the highest possible performance the converter can provide
- ILP/SAT problems are computationally hard, so conversion may take significantly longer
- Limited scalability for large models or tight hardware constraints
- Requires more compute resources, typically up to 16 CPU cores can be used in parallel

This mode is suitable when maximum performance is required and longer conversion times are acceptable.