{ "cells": [ { "cell_type": "markdown", "id": "31cd209d-39a4-4e0b-b34a-ced26db5b7d3", "metadata": {}, "source": "# TensorFlow to TF Lite" }, { "cell_type": "markdown", "id": "c06c3b4b-01c8-4b99-9b30-7c8bd7a676cb", "metadata": { "jp-MarkdownHeadingCollapsed": true }, "source": [ "**eIQ AI Toolkit** does not support conversion from TensorFlow to quantized TF Lite, as this process is typically handled directly using TensorFlow’s built-in conversion utilities.\n", "This guide will demonstrate how to perform this conversion." ] }, { "cell_type": "markdown", "id": "cdfd8fee-fd0e-42c4-8896-9eb3e14db77e", "metadata": {}, "source": [ "TensorFlow uses multiple model representations. We will be focusing on the following representations:\n", "- [Keras](https://www.tensorflow.org/tutorials/keras/save_and_load#new_high-level_keras_format) (.h5/.hdf5)\n", "- [SavedModel](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md)\n", "- [Keras Applications](https://keras.io/api/applications/)" ] }, { "cell_type": "markdown", "id": "c134cd6c-8763-42cc-ba7f-8ac2d7ea28d4", "metadata": {}, "source": "First we need to install required Python packages." }, { "cell_type": "code", "id": "9d964898-48e4-4e4f-8a8e-c72da0732435", "metadata": {}, "source": [ "!pip install tensorflow==2.18.1\n", "!pip install numpy\n", "!pip install pillow\n", "!pip install kagglehub" ], "outputs": [], "execution_count": null }, { "cell_type": "code", "id": "c38a4cd8-11ae-4f3f-9aed-21a04dc40e07", "metadata": {}, "source": [ "import zipfile\n", "import tensorflow as tf\n", "import numpy as np\n", "import kagglehub\n", "import os\n", "\n", "from PIL import Image" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "id": "b270b38c-36af-49ed-abc3-e69361a6a763", "metadata": {}, "source": [ "### Load a SavedModel model\n", "\n", "The **SavedModel** format is a directory that contains a protobuf binary along with a TensorFlow checkpoint." ] }, { "cell_type": "code", "id": "b64b9dad-9404-451f-877b-33e39f22a7ae", "metadata": {}, "source": [ "my_model = tf.keras.models.load_model(\"saved_model/my_model\")\n", "converter = tf.lite.TFLiteConverter.from_saved_model(my_model)" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "id": "e712d4d3-2b67-46d1-8feb-f9e70e87da60", "metadata": {}, "source": [ "### Load a Keras model\n", "\n", "A Keras model can be saved using either the [HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) standard or the new Keras v3 saving format. Typically, it is stored as a single file with one of the following extensions: `.h5`, .`hdf5`, or `.keras`." ] }, { "cell_type": "code", "id": "f5d27aa7-f188-4e3e-9867-401b3f4ab8e9", "metadata": {}, "source": [ "filename = \"model.h5\"\n", "model = tf.keras.models.load_model(filename)\n", "converter = tf.lite.TFLiteConverter.from_keras_model(model)" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "id": "86a7e82a-1d55-4765-a922-7286adf836da", "metadata": {}, "source": [ "### Load a model from Keras Applications\n", "If you do not have a pretrained TensorFlow model, you can use Keras Applications to select from a collection of prebuilt ones." ] }, { "cell_type": "code", "id": "6df0a72d-8071-4ccc-9cf6-862cf6387410", "metadata": {}, "source": [ "model = tf.keras.applications.MobileNet(\n", " input_shape=None,\n", " alpha=1.0,\n", " depth_multiplier=1,\n", " dropout=0.001,\n", " include_top=True,\n", " weights='imagenet',\n", " input_tensor=None,\n", " pooling=None,\n", " classes=1000,\n", " classifier_activation='softmax'\n", ")\n", "converter = tf.lite.TFLiteConverter.from_keras_model(model)" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "id": "0c226912-fb68-4137-969a-566e0420e018", "metadata": {}, "source": [ "### Quantization in TensorFlow\n", "\n", "Quantization is a technique that converts numbers into a smaller set of discrete values, typically reducing the model’s memory footprint and latency. The default quantization mode in the TensorFlow Lite Converter is [*Dynamic Range Quantization*](https://ai.google.dev/edge/litert/models/post_training_quant), which produces **int8 weights** and **fp32 activations**.\n", "\n", "However, ML accelerators such as those on **i.MX8MP**, **i.MX93**, or the **eIQ Neutron NPU** only support int8/int16 operations. Therefore, this default scheme will not fully utilize the NPU.\n", "\n", "It is recommended to either:\n", "- Quantize the entire model to int8, or\n", "- Use a hybrid scheme with int8 weights and int16 activations." ] }, { "cell_type": "markdown", "id": "162d370d-7e89-4f39-8900-63bf80312cf1", "metadata": {}, "source": [ "#### Calibration Dataset\n", "\n", "Quantization requires a calibration dataset to determine the range of values, ideally using data that closely represents what will be used in production. More details can be found in the [TensorFlow documentation](https://www.tensorflow.org/api_docs/python/tf/lite/RepresentativeDataset).\n", "\n", "In the example below, we will create a calibration dataset representing ImageNet using its subset tiny-imagenet-200, available at:\n", "https://www.kaggle.com/datasets/mariamalkuwaiti/tiny-imagenet-200-zip." ] }, { "cell_type": "code", "id": "184bc491-d3c7-4634-9f17-5bae9612177f", "metadata": {}, "source": [ "# If you are unable to download the dataset through the Python API, you may directly download it from the link above\n", "filepath = kagglehub.dataset_download(\"mariamalkuwaiti/tiny-imagenet-200-zip\")\n", "with zipfile.ZipFile(filepath, 'r') as zip_ref:\n", " zip_ref.extractall(os.path.abspath(\"\"))" ], "outputs": [], "execution_count": null }, { "cell_type": "code", "id": "a39a1ba2-e7de-40a0-a4a1-7b90b25a8c45", "metadata": {}, "source": [ "def representative_data_gen(data_dir=None, num_samples=100):\n", " image_paths = []\n", " data_dir = os.path.join(os.path.abspath(\"\"), \"tiny-imagenet-200\")\n", " for class_dir in os.listdir(data_dir):\n", " class_path = os.path.join(data_dir, class_dir, 'images')\n", " if os.path.isdir(class_path):\n", " for img_file in os.listdir(class_path):\n", " if img_file.endswith('.JPEG'):\n", " image_paths.append(os.path.join(class_path, img_file))\n", " if len(image_paths) >= num_samples:\n", " break\n", "\n", " # Preprocess images for the MobileNet application\n", " for img_path in image_paths:\n", " img = Image.open(img_path).convert('RGB')\n", " img = img.resize((224, 224))\n", " img_array = np.array(img, dtype=np.float32)\n", " img_array = tf.keras.applications.mobilenet.preprocess_input(img_array)\n", " img_array = np.expand_dims(img_array, axis=0)\n", " yield [img_array]\n", "\n", "converter.optimizations = [tf.lite.Optimize.DEFAULT]\n", "converter.representative_dataset = lambda: representative_data_gen()" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "id": "8f289341-5638-4026-94ef-ffbc3e4a3ff1", "metadata": {}, "source": [ "##### [INT8 Quantization](https://ai.google.dev/edge/litert/models/post_training_integer_quant)\n", "Converts both weights and activations to int8. This is the recommended approach for achieving the lowest latency and the best operator support." ] }, { "cell_type": "code", "id": "487938f1-8975-4c26-bf15-6304116030ea", "metadata": {}, "source": [ "converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]\n", "converter.inference_input_type = tf.int8\n", "converter.inference_output_type = tf.int8" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "id": "4f98c9d0-b8fe-438d-ad1d-d0c76710a224", "metadata": {}, "source": [ "##### [INT8W/INT16A Quantization](https://ai.google.dev/edge/litert/models/post_training_integer_quant_16x8)\n", "Converts weights to int8 and activations to int16. This mode offers slightly better accuracy compared to full int8 quantization; however, it results in higher latency, a larger memory footprint, and reduced operator support for int16 kernels." ] }, { "cell_type": "code", "id": "2dc9b42d-2430-44cb-b639-7cf74e1e7960", "metadata": {}, "source": [ "converter.target_spec.supported_ops = [tf.lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8]\n", "converter.inference_input_type = tf.float32\n", "converter.inference_output_type = tf.int16" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "id": "eb40503c-2d15-41b8-9f42-2ec026e7effe", "metadata": {}, "source": "The TFLiteConverter setup is now complete. You can proceed to convert the model and write it to a file:" }, { "cell_type": "code", "id": "98d52e6e-8a9b-474d-b13b-760cc58cf714", "metadata": {}, "source": [ "filename = \"output_model.tflite\"\n", "tflite_model = converter.convert()\n", "with open(filename, \"wb\") as f:\n", " f.write(tflite_model)" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "id": "c6697712-1ffc-4d21-85c2-6c066c975839", "metadata": {}, "source": [ "Depending on your target hardware platform, follow these guidelines to enable inference acceleration using an NPU:\n", "\n", "- For **MCX-N**, **i.MX RT700**, **i.MX95**, **i.MX943**, and similar devices, refer to the guide: Deploying a TF Lite Model to eIQ Neutron NPU.\n", "- For **i.MX 93**, refer to the guide: Deploying a TF Lite Model to i.MX 93.\n", "\n", "On other platforms, you can run the model directly, and the NPU will be utilized when available." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" } }, "nbformat": 4, "nbformat_minor": 5 }