Run Simulated Profiling
========================

Before deploying to hardware, you can use the Simulated Profiling feature to estimate model
performance on target NXP devices without requiring physical hardware access.

.. image:: /_static/eIQ_AIHub_simulator_profiling.gif
   :alt: Simulator Profiling
   :width: 100%

What is Simulated Profiling?
----------------------------

Simulated Profiling uses NPU compiler tools to estimate execution performance directly in the
cloud. It provides:

* Estimated NPU execution cycles without device access
* Operator-level execution cycle estimates
* Per-node profiling statistics (clock cycles, operator mapping)
* Estimated total inference time
* Quick iteration without waiting for board availability

This phase is ideal for verifying that optimizations performed earlier behave as expected
before moving to on-device profiling.

Run a Simulated Profiling Session
----------------------------------

The Simulated Profiling page provides a workflow canvas similar to the Optimize & Convert page.
You select your model and configure the profiling parameters.

**Steps:**

1. Switch to the **AI Toolkit** tab in the top navigation bar.
2. In the left sidebar, under **Model evaluation**, click **Simulated profiling**.
3. On the workflow canvas, your model appears as a node. If not, select your model from
   the available resources.
4. Configure the simulated profiling step:

   - **Target** — select the target device (e.g., ``imxrt700``).
   - **Engine** — select the NPU engine (e.g., ``Neutron``).

5. Click the **Run** button to start the profiling session.
6. A confirmation message appears when the pipeline is submitted. Navigate to
   **Profiling history** to monitor progress.

Review Profiling Results
-------------------------

When the profiling session completes, click on the entry in **Profiling history** to view
the detailed results.

**Session metadata includes:**

* **Model name** — the profiled model
* **Type** — ``Simulated``
* **Target** — target device (e.g., ``imxrt700``)
* **Engine** — NPU engine used (e.g., ``Neutron``)
* **Model size** — size of the model file
* **Estimated inference time** — total estimated inference time in milliseconds
* **Tensor arena size** — memory arena allocated for tensor operations

**Per-node profiling statistics table:**

* **Node ID** — unique identifier for each operator node
* **Node name** — name of the operator (e.g., ``Conv2D``, ``DepthwiseConv2D``)
* **Order** — execution order of the node
* **Operator name** — type of operation
* **Clock cycles** — estimated clock cycles for that node

Use these results to identify performance bottlenecks and validate that the model meets
your latency requirements before proceeding to on-device profiling.

.. note::
   Please refer to AI Toolkit document for detailed information.

Next Steps
----------

* :doc:`Run MCU profiling<./mcu_profiling>`
* :doc:`Profile on real hardware<./ondevice_profiling>`
* :doc:`Benchmark the model<./benchmark>`