Inference Execution Flows

Various inference execution flows can be used depending on model conversion rate.

When a model is fully converted, inference can be executed end‑to‑end on the NPU, enabling a streamlined and highly optimized execution flow.

For partially converted models, standard inference execution flow must be used, combining NPU‑accelerated components with CPU fallback for unsupported operations. These differentiated execution paths allow the system to adapt to conversion outcomes while preserving correctness and optimizing performance and resource usage.