On-Device Profiling
====================

If benchmark results on real hardware can't meet expectations, the next step is to use
the on-device profiling feature to identify bottlenecks on the actual target hardware.

On-Device Profiling Allows You To
----------------------------------

* Measure actual latency on the target device
* Profile layer-by-layer execution timing
* Detect platform-specific bottlenecks (NPU, memory bandwidth, etc.)
* Compare performance across hardware options
* Analyze DDR read/write bandwidth and GPU cycle utilization

Typical Devices Available
-------------------------

The AI Hub boardfarm provides remote access to NXP hardware. Available devices include:

* i.MX 8M Plus, i.MX 8M family (VSI NPU/GPU)
* i.MX 93 (Ethos-U NPU)
* i.MX 95, i.MX 943, i.MX 952 family (**eIQ Neutron NPU**
* MCX N94x, MCX N54x, i.MX RT700 (**eIQ Neutron NPU**)

.. note::
   Device availability depends on the current boardfarm inventory. Only TensorFlow Lite
   (``.tflite``) models are supported for on-device profiling.

Run an On-Device Profiling Session
-----------------------------------

The On-device profiling page lets you select your target hardware, model, and software
stack, then run the profiling on a physical device in the AI Hub boardfarm.

.. image:: /_static/eIQ_AIHub_ondevice_profiling_1.gif
   :alt: On-Device Profiling Configuration
   :width: 100%

**Steps:**

1. Switch to the **AI Toolkit** tab in the top navigation bar.
2. In the left sidebar, under **Model evaluation**, click **On-device profiling**.
3. On the On-device profiling page, review the information box:

   - Only TensorFlow Lite (``.tflite``) models are supported.

4. Configure the profiling parameters:

   - **Select device** — choose the target hardware from the available devices in the
     boardfarm (e.g., ``i.MX 8M Plus Applications Processor``).
   - **Select backend** — choose the inference backend (e.g., ``npu``, ``cpu``).
   - **Select model** — choose the model to profile from your uploaded models
     (e.g., ``mobilenet_v1_10_224_int8``).
   - **Select Yocto image** — choose the Yocto BSP image running on the target device
     (e.g., ``2026 Q1``, ``2025 Q4``).

5. Optionally, enter a **Custom run name** to label this profiling session.
6. Click the **Profile model** button to submit the profiling job.
7. A confirmation message appears: *"Profiling is in progress"*. You can:

   - Click **Profiling history** to monitor progress.
   - Click **Profile another** to start a new profiling session.

Review Profiling Results
-------------------------

After the profiling session completes, navigate to **Profiling history** in the left
sidebar and click on the entry to view the detailed results.

.. image:: /_static/eIQ_AIHub_ondevice_profiling_2.gif
   :alt: On-Device Profiling Results
   :width: 100%

**Session metadata includes:**

* **Type** — ``On device``
* **Target** — target device identifier (e.g., ``imx8mpevk``)
* **Engine** — inference engine used (e.g., ``NPU``)
* **Tensor arena size** — memory arena allocated for tensor operations
* **Model size** — size of the model file
* **Total inference time** — total inference time in milliseconds

**Per-node profiling statistics table:**

* **Node id** — unique identifier for each operator node
* **Name** — layer type (e.g., ``Convolution``, ``BatchNorm``, ``TensorCopy``)
* **Order** — execution order of the node
* **Op name** — hardware-specific operation name (e.g., ``VXNNE_OP_*``)
* **Inputs / Outputs** — number of input and output tensors
* **Input shape / Output shape** — tensor dimensions
* **Processing type** — execution mode (e.g., ``Parallel Processing``)
* **Execution time** — execution time in milliseconds for that node
* **DDR Read / DDR Write** — DDR memory read and write bandwidth usage
* **GPU Idle cycles / GPU Total cycles** — GPU utilization metrics

Use these results to identify performance bottlenecks (e.g., high-latency layers,
memory bandwidth saturation) and validate that the model meets your latency
requirements for the target hardware.

.. note::
   Please refer to :doc:`Profiling Section<../../prof/index>` for detailed information.

Next Steps
----------

* :doc:`Run MCU profiling<./mcu_profiling>`
* :doc:`Benchmark the model<./benchmark>`