eIQ Neutron-S¶

The Neutron Driver and Neutron Firmware run on different cores (CPU vs RISC-V) and so they must employ some mechanisms for inter-core communication. This is done using the MAILBOX which is a direct connection (“wire”) between the CPU and the RISC-V allowing them to exchange a small amount of information with a low latency. This mechanism is not suitable for exchanging large amounts of data (this is what DDR is used for) but is suitable to exchange control information such as flags, status, addresses.
- The inference is controlled by the CPU.
- The model can be fully or partly executed by Neutron.
- Parts of the graph that are not supported by Neutron can be offloaded by the inference engine to CPU or other backends it provides.

The Neutron Firmware is composed of two parts:
- Resident section: This section is model independent and is fetched from DDR into the RISC-V I-TCM only once at system boot time. This section contains the Neutron Operating System (Neutron OS) which is a collection of utilities that are used frequently such as the microcode interpreter, the DataMover microinstructions, subsystem utilities, etc. This section is not reloaded but reused for subsequent inferences of the same or different models.
- Paged section: This section is model dependent and is fetched from DDR into the RISC-V I-TCM page by page depending the current needs of the microcode. This is the actual “kernels” buffer that is generated by the Neutron Converter tool and stored in DDR and contains all the kernels that are used by the specific TFLite model that was converted, pre-build and concatenated in the exact order they are required by the microcode during the inference such that the prefetch from DDR into the I-TCM is linear relative to the DDR. This section is reloaded, page by page, for subsequent inferences of the same or different models.
The Neutron Firmware executes the microcode using the weights, kernels and input/output data buffers. Upon completion it signals the Neutron Driver that the work is done.
The Neutron Firmware will execute the operators specified by the microcode by fetching the required kernels from DDR into I-TCM to execute the required operators.
The Neutron Driver, when signaled that the work is done, returns the control back to the inference engine and application.

What is specific to this flow is:

The inference is controlled by the CPU.
The model can be fully or partly executed by Neutron.
Parts of the graph that are not supported by Neutron can be offloaded by the inference engine to CPU, GPU, DSP or other backends it provides.