ONNX Quantization¶
eIQ AI Toolkit performs quantization using the Olive (https://microsoft.github.io/Olive/) framework. Olive uses passes, where each pass represents a single transformation of the model. Passes can also be chained together to apply multiple transformations in sequence.
To quantize an ONNX model, a calibration dataset is required. This guide will show you how to use the ONNX2Quant pass to quantize an ONNX model. Specifically, it will demonstrate how to use the eIQ AI Toolkit API to:
Load an ONNX model into the application
Load a calibration dataset into the application
Run ONNX quantization
Retrieve the quantized model
This guide requires the eIQ AI Toolkit backend to be running. If you haven’t set it up yet, please refer to the following tutorial: eIQ AI Toolkit setup & launch
[ ]:
import requests
from pathlib import Path
# Set your eIQ AI Toolkit url:
AI_TOOLKIT_BACKEND_URL = "http://localhost:8000"
Load ONNX model¶
Loading any type of model into an application involves two steps:
Specify the model metadata
Upload the model file
The metadata must always include the type of model being uploaded. For example, if you are uploading a PyTorch model, set the type to pytorch; for an ONNX model, use onnx, and so on.
For an ONNX model, no additional metadata is required.
1. Prepare ONNX model¶
If you already have a model prepared, simply update the path to point to its location. If you do not yet have a model, set the path to the location where the model should be saved. (Refer to the following sections for instructions on downloading a sample model.)
[ ]:
# Modify the path to your ONNX model
model_path = Path("path_to_onnx_model.onnx")
Use the following script to download the example model:
Note: Skip this step if you already have your own model.
[ ]:
example_model_url = "https://eiq.nxp.com/training-materials/_misc/models/model.onnx"
with open(model_path, "wb") as f:
response = requests.get(
url=example_model_url
)
f.write(response.content)
2. Specify metadata¶
In this section, we will upload the model metadata to eIQ AI Toolkit. This step involves specifying only the type of model being uploaded.
[ ]:
import requests
MODELS_API_URL = f"{AI_TOOLKIT_BACKEND_URL}/models"
model_metadata = {
"model_type": "onnx",
}
response = requests.post(MODELS_API_URL, json=model_metadata)
response_data = response.json()
model_uuid = response_data["data"]['model']['uuid']
3. Upload model¶
Now we can upload the model file.
[ ]:
with open(model_path, "rb") as model_file:
response = requests.post(
url=f"{AI_TOOLKIT_BACKEND_URL}/models/{model_uuid}", # Model identifier is part of the request URL
files={
"model_file": model_file,
}
)
print(response.json())
After uploading the model metadata and file, you can verify the model’s registration and readiness status using the following endpoint. If the status remains in_progress, call the endpoint repeatedly until it changes to ready.
[ ]:
response = requests.get(f"{AI_TOOLKIT_BACKEND_URL}/models/{model_uuid}")
data = response.json()
print(f'Model status: {data["data"]["model"]["status"]}')
print(f'Model status description: {data["data"]["model"]["status_description"]}')
Upload calibration dataset¶
Quantizing an ONNX model requires a calibration dataset to be provided. To achieve this, you need to upload the dataset to the application as a .zip file. The calibration dataset must follow the correct directory structure. For example, for a model with two inputs named input_name_1 and input_name_2, the expected dataset structure is as follows:
dataset.zip
├── input_name_1
│ ├── sample_1.npy
│ ├── sample_2.npy
│ └── ...
└── input_name_2
├── sample_1.npy
├── sample_2.npy
└── ...
You can have multiple .npy files in each input folder representing different calibration samples.
1. Prepare dataset¶
If you already have a dataset prepared, simply update the path to point to its location. If you do not yet have a dataset, set the path to the location where the dataset should be saved. (Refer to the following sections for instructions on downloading a sample dataset.)
[ ]:
# Modify the path if you already have a dataset
dataset_path = Path("path_to_calibration_dataset.zip")
Use the following script to download the example dataset:
Note: Skip this step if you already have your own dataset.
[ ]:
example_dataset_url = "https://eiq.nxp.com/training-materials/_misc/datasets/kws_calib.zip"
with open(dataset_path, "wb") as f:
response = requests.get(
url=example_dataset_url
)
f.write(response.content)
2. Upload dataset¶
Uploading a dataset does not require two steps as it does when uploading models. You can upload a dataset with a single endpoint call:
[ ]:
DATASETS_API_URL = f"{AI_TOOLKIT_BACKEND_URL}/datasets"
uploaded_dataset_name = "My calibration dataset"
with dataset_path.open("rb") as zip_file:
files = {
"dataset_file": ("dataset.zip", zip_file, "application/zip")
}
data = {
"dataset_name": uploaded_dataset_name,
"dataset_type": "calibration"
}
response = requests.post(DATASETS_API_URL, files=files, data=data)
response_data = response.json()
dataset_uuid = response_data["data"]['dataset']['uuid']
After uploading the dataset archive, you can verify the datasets’s status using the following endpoint. If the status remains in_progress, call the endpoint repeatedly until it changes to ready.
[ ]:
response = requests.get(f"{DATASETS_API_URL}/{dataset_uuid}")
data = response.json()
print(f'Dataset status: {data["data"]["dataset"]["status"]}')
print(f'Dataset status description: {data["data"]["dataset"]["status_description"]}')
Quantization¶
Now that both the dataset and model are prepared, you can run the quantization pass. If you want to see which configuration parameters can be set for this pass, send a request to the following endpoint:
[ ]:
available_passes_response = requests.get(f"{AI_TOOLKIT_BACKEND_URL}/optimizations/passes")
available_passes = available_passes_response.json()
# This prints configuration parameters only for ONNX2Quant pass. Feel free to change it and explore
# other passes as well.
onnx2quant_pass_config = next(_pass for _pass in available_passes["data"]["passes"] if _pass["type"] == "ONNX2Quant")
print(onnx2quant_pass_config)
Now run the quantization:
[ ]:
OPTIMIZATIONS_API_URL = f"{AI_TOOLKIT_BACKEND_URL}/optimizations"
RUN_OPTIMIZATION_API_URL = f"{OPTIMIZATIONS_API_URL}/run"
pass_config = {
"model_uuid": model_uuid,
"passes": [
{
"type": "ONNX2Quant",
"config": {
"allow_opset_10_and_lower": "false",
"dataset_uuid": dataset_uuid,
}
},
]
}
optimization_response = requests.post(RUN_OPTIMIZATION_API_URL, json=pass_config)
optimization_response_data = optimization_response.json()
optimization_uuid = optimization_response_data["data"]["optimization"]["uuid"]
The quantization process is now running. You can check its status by calling this endpoint repeatedly until the status changes to success.
[ ]:
response = requests.get(f"{OPTIMIZATIONS_API_URL}/{optimization_uuid}")
response_data = response.json()
status = response_data["data"]["optimization"]["status"]
print(f"Quantization status: {status}")
if status == "success":
artifact_id = response_data["data"]["optimization"]["artifacts"][0]["artifact_id"]
Download quantized model¶
[ ]:
# Change model path to your location
dest_model_path = Path("quantized_model.onnx")
[ ]:
download_response = requests.get(f"{AI_TOOLKIT_BACKEND_URL}/optimizations/{optimization_uuid}/resources/{artifact_id}")
with dest_model_path.open("wb") as f:
f.write(download_response.content)