Run Simulated Profiling

Before deploying to hardware, you can use the Simulated Profiling feature to estimate model performance on target NXP devices without requiring physical hardware access.

Simulator Profiling

What is Simulated Profiling?

Simulated Profiling uses NPU compiler tools to estimate execution performance directly in the cloud. It provides:

  • Estimated NPU execution cycles without device access

  • Operator-level execution cycle estimates

  • Per-node profiling statistics (clock cycles, operator mapping)

  • Estimated total inference time

  • Quick iteration without waiting for board availability

This phase is ideal for verifying that optimizations performed earlier behave as expected before moving to on-device profiling.

Run a Simulated Profiling Session

The Simulated Profiling page provides a workflow canvas similar to the Optimize & Convert page. You select your model and configure the profiling parameters.

Steps:

  1. Switch to the AI Toolkit tab in the top navigation bar.

  2. In the left sidebar, under Model evaluation, click Simulated profiling.

  3. On the workflow canvas, your model appears as a node. If not, select your model from the available resources.

  4. Configure the simulated profiling step:

    • Target — select the target device (e.g., imxrt700).

    • Engine — select the NPU engine (e.g., Neutron).

  5. Click the Run button to start the profiling session.

  6. A confirmation message appears when the pipeline is submitted. Navigate to Profiling history to monitor progress.

Review Profiling Results

When the profiling session completes, click on the entry in Profiling history to view the detailed results.

Session metadata includes:

  • Model name — the profiled model

  • TypeSimulated

  • Target — target device (e.g., imxrt700)

  • Engine — NPU engine used (e.g., Neutron)

  • Model size — size of the model file

  • Estimated inference time — total estimated inference time in milliseconds

  • Tensor arena size — memory arena allocated for tensor operations

Per-node profiling statistics table:

  • Node ID — unique identifier for each operator node

  • Node name — name of the operator (e.g., Conv2D, DepthwiseConv2D)

  • Order — execution order of the node

  • Operator name — type of operation

  • Clock cycles — estimated clock cycles for that node

Use these results to identify performance bottlenecks and validate that the model meets your latency requirements before proceeding to on-device profiling.

Note

Please refer to AI Toolkit document for detailed information.

Next Steps