On-Device Profiling ==================== If benchmark results on real hardware can't meet expectations, the next step is to use the on-device profiling feature to identify bottlenecks on the actual target hardware. On-Device Profiling Allows You To ---------------------------------- * Measure actual latency on the target device * Profile layer-by-layer execution timing * Detect platform-specific bottlenecks (NPU, memory bandwidth, etc.) * Compare performance across hardware options * Analyze DDR read/write bandwidth and GPU cycle utilization Typical Devices Available ------------------------- The AI Hub boardfarm provides remote access to NXP hardware. Available devices include: * i.MX 8M Plus, i.MX 8M family (VSI NPU/GPU) * i.MX 93 (Ethos-U NPU) * i.MX 95, i.MX 943, i.MX 952 family (**eIQ Neutron NPU** * MCX N94x, MCX N54x, i.MX RT700 (**eIQ Neutron NPU**) .. note:: Device availability depends on the current boardfarm inventory. Only TensorFlow Lite (``.tflite``) models are supported for on-device profiling. Run an On-Device Profiling Session ----------------------------------- The On-device profiling page lets you select your target hardware, model, and software stack, then run the profiling on a physical device in the AI Hub boardfarm. .. image:: /_static/eIQ_AIHub_ondevice_profiling_1.gif :alt: On-Device Profiling Configuration :width: 100% **Steps:** 1. Switch to the **AI Toolkit** tab in the top navigation bar. 2. In the left sidebar, under **Model evaluation**, click **On-device profiling**. 3. On the On-device profiling page, review the information box: - Only TensorFlow Lite (``.tflite``) models are supported. 4. Configure the profiling parameters: - **Select device** — choose the target hardware from the available devices in the boardfarm (e.g., ``i.MX 8M Plus Applications Processor``). - **Select backend** — choose the inference backend (e.g., ``npu``, ``cpu``). - **Select model** — choose the model to profile from your uploaded models (e.g., ``mobilenet_v1_10_224_int8``). - **Select Yocto image** — choose the Yocto BSP image running on the target device (e.g., ``2026 Q1``, ``2025 Q4``). 5. Optionally, enter a **Custom run name** to label this profiling session. 6. Click the **Profile model** button to submit the profiling job. 7. A confirmation message appears: *"Profiling is in progress"*. You can: - Click **Profiling history** to monitor progress. - Click **Profile another** to start a new profiling session. Review Profiling Results ------------------------- After the profiling session completes, navigate to **Profiling history** in the left sidebar and click on the entry to view the detailed results. .. image:: /_static/eIQ_AIHub_ondevice_profiling_2.gif :alt: On-Device Profiling Results :width: 100% **Session metadata includes:** * **Type** — ``On device`` * **Target** — target device identifier (e.g., ``imx8mpevk``) * **Engine** — inference engine used (e.g., ``NPU``) * **Tensor arena size** — memory arena allocated for tensor operations * **Model size** — size of the model file * **Total inference time** — total inference time in milliseconds **Per-node profiling statistics table:** * **Node id** — unique identifier for each operator node * **Name** — layer type (e.g., ``Convolution``, ``BatchNorm``, ``TensorCopy``) * **Order** — execution order of the node * **Op name** — hardware-specific operation name (e.g., ``VXNNE_OP_*``) * **Inputs / Outputs** — number of input and output tensors * **Input shape / Output shape** — tensor dimensions * **Processing type** — execution mode (e.g., ``Parallel Processing``) * **Execution time** — execution time in milliseconds for that node * **DDR Read / DDR Write** — DDR memory read and write bandwidth usage * **GPU Idle cycles / GPU Total cycles** — GPU utilization metrics Use these results to identify performance bottlenecks (e.g., high-latency layers, memory bandwidth saturation) and validate that the model meets your latency requirements for the target hardware. .. note:: Please refer to :doc:`Profiling Section<../../prof/index>` for detailed information. Next Steps ---------- * :doc:`Run MCU profiling<./mcu_profiling>` * :doc:`Benchmark the model<./benchmark>`