Converting for eIQ Neutron NPU¶
To use a model on any device with NPU acceleration supported by eIQ AI Toolkit, the model must first be converted to a quantized TF Lite format. Depending on the specific platform, an additional conversion step may be required.
Several devices, such as MCX-N, i.MX RT700, i.MX95, and i.MX943, use the Neutron NPU. To run a model on these devices with Neutron acceleration, you need to further convert the quantized TFLite model into a format specific to Neutron.
eIQ AI Toolkit performs conversions using the Olive (https://microsoft.github.io/Olive/) framework. Olive uses passes, where each pass represents a single transformation of the model. Passes can also be chained together to apply multiple transformations in sequence.
This guide will show you how to:
Load a quantized TF Lite model into eIQ AI Toolkit
Convert the quantized TF Lite model to a Neutron NPU-compatible TF Lite model
Download the converted model for use in an application on your device
This guide requires the eIQ AI Toolkit backend to be running. If you haven’t set it up yet, please refer to the following tutorial: eIQ AI Toolkit setup & launch
[ ]:
import requests
from pathlib import Path
# Set your eIQ AI Toolkit url:
AI_TOOLKIT_BACKEND_URL = "http://localhost:8000"
Load quantized TF Lite model¶
Loading any type of model into an application involves two steps:
Specifying the model metadata
Uploading the model file
The metadata must always include the type of model you are uploading. For example, if you are uploading a PyTorch model, set the type to pytorch; for an ONNX model, use onnx, and so on.
For a TFLite model, no additional metadata is required.
1. Prepare quantized TF Lite model¶
If you already have a model prepared, update the path to point to its location. If you do not yet have a model, set the path to the location where the model should be saved. (Refer to the following sections for instructions on downloading a sample model.)
[ ]:
# Modify the path to your TFLite model
model_path = Path("path_to_quantized_tflite_model.tflite")
Use the following script to download the example model:
Note: Skip this step if you already have your own model.
[ ]:
example_model_url = "https://eiq.nxp.com/training-materials/_misc/models/mobilenet_v3-small_224_1.0_uint8.tflite"
with open(model_path, "wb") as f:
response = requests.get(
url=example_model_url
)
f.write(response.content)
2. Specify metadata¶
In this section, we will upload the model metadata to eIQ AI Toolkit. This step only requires specifying the type of model being uploaded.
[ ]:
MODELS_API_URL = f"{AI_TOOLKIT_BACKEND_URL}/models"
model_metadata = {
"model_type": "tflite",
}
response = requests.post(MODELS_API_URL, json=model_metadata)
response_data = response.json()
model_uuid = response_data["data"]["model"]["uuid"]
3. Upload model¶
Now we can upload the model file.
[ ]:
with open(model_path, "rb") as model_file:
response = requests.post(
url=f"{AI_TOOLKIT_BACKEND_URL}/models/{model_uuid}", # Model identifier is part of the request URL
files={
"model_file": model_file,
}
)
print(response.json())
After uploading the model metadata and file, you can verify the model’s registration and readiness status using the following endpoint. If the status remains in_progress, call the endpoint repeatedly until it changes to ready.
[ ]:
response = requests.get(f"{AI_TOOLKIT_BACKEND_URL}/models/{model_uuid}")
data = response.json()
print(f'Model status: {data["data"]["model"]["status"]}')
print(f'Model status description: {data["data"]["model"]["status_description"]}')
Conversion to Neutron NPU specific format¶
Before running the conversion, two important parameters must be configured:
Neutron target
Neutron flavor (version)
The code below displays all available options for these parameters. Neutron target should be set to the device where you plan to deploy the converted model. Neutron flavor should match the exact version of the SDK (for MCUs) or Yocto (for MPUs) you are using.
If you are following this guide to convert and use a model on your device, make sure to adjust these parameters accordingly.
[ ]:
available_passes_response = requests.get(f"{AI_TOOLKIT_BACKEND_URL}/optimizations/passes")
available_passes = available_passes_response.json()
# This prints configuration parameters only for NeutronConversion pass. Feel free to change it and explore
# other passes as well.
neutron_conversion_pass_config = next(_pass for _pass in available_passes["data"]["passes"] if _pass["type"] == "NeutronConversion")
print(neutron_conversion_pass_config)
[ ]:
OPTIMIZATIONS_API_URL = f"{AI_TOOLKIT_BACKEND_URL}/optimizations"
RUN_OPTIMIZATION_API_URL = f"{OPTIMIZATIONS_API_URL}/run"
pass_config = {
"model_uuid": model_uuid,
"passes": [
{
"type": "NeutronConversion",
"config": {
"target": "imxrt700", # Change this to match your specific Neutron NPU target
"flavor": "MCUXpresso SDK 25.09" # Change this to match your MCU SDK or Yocto version
}
}
]
}
optimization_response = requests.post(RUN_OPTIMIZATION_API_URL, json=pass_config)
optimization_response_data = optimization_response.json()
optimization_uuid = optimization_response_data["data"]["optimization"]["uuid"]
The conversion process is now running. You can check its status by calling this endpoint repeatedly until the status changes to success.
[ ]:
response = requests.get(f"{OPTIMIZATIONS_API_URL}/{optimization_uuid}")
data = response.json()
status = data["data"]["optimization"]["status"]
print(f"Conversion status: {status}")
if status == "success":
artifact_id = data["data"]["optimization"]["artifacts"][0]["artifact_id"]
Download converted model¶
Once the conversion is complete and successful, you can download the resulting model and use it on your Neutron device.
[ ]:
# Change model path to your location
dest_model_path = Path("neutron_converted_model.tflite")
[ ]:
download_response = requests.get(f"{AI_TOOLKIT_BACKEND_URL}/optimizations/{optimization_uuid}/resources/{artifact_id}")
with dest_model_path.open("wb") as f:
f.write(download_response.content)