{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "31cd209d-39a4-4e0b-b34a-ced26db5b7d3",
   "metadata": {},
   "source": "# TensorFlow to TF Lite"
  },
  {
   "cell_type": "markdown",
   "id": "c06c3b4b-01c8-4b99-9b30-7c8bd7a676cb",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "source": [
    "**eIQ AI Toolkit** does not support conversion from TensorFlow to quantized TF Lite, as this process is typically handled directly using TensorFlow’s built-in conversion utilities.\n",
    "This guide will demonstrate how to perform this conversion."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cdfd8fee-fd0e-42c4-8896-9eb3e14db77e",
   "metadata": {},
   "source": [
    "TensorFlow uses multiple model representations. We will be focusing on the following representations:\n",
    "- [Keras](https://www.tensorflow.org/tutorials/keras/save_and_load#new_high-level_keras_format) (.h5/.hdf5)\n",
    "- [SavedModel](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md)\n",
    "- [Keras Applications](https://keras.io/api/applications/)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c134cd6c-8763-42cc-ba7f-8ac2d7ea28d4",
   "metadata": {},
   "source": "First we need to install required Python packages."
  },
  {
   "cell_type": "code",
   "id": "9d964898-48e4-4e4f-8a8e-c72da0732435",
   "metadata": {},
   "source": [
    "!pip install tensorflow==2.18.1\n",
    "!pip install numpy\n",
    "!pip install pillow\n",
    "!pip install kagglehub"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "code",
   "id": "c38a4cd8-11ae-4f3f-9aed-21a04dc40e07",
   "metadata": {},
   "source": [
    "import zipfile\n",
    "import tensorflow as tf\n",
    "import numpy as np\n",
    "import kagglehub\n",
    "import os\n",
    "\n",
    "from PIL import Image"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "b270b38c-36af-49ed-abc3-e69361a6a763",
   "metadata": {},
   "source": [
    "### Load a SavedModel model\n",
    "\n",
    "The **SavedModel** format is a directory that contains a protobuf binary along with a TensorFlow checkpoint."
   ]
  },
  {
   "cell_type": "code",
   "id": "b64b9dad-9404-451f-877b-33e39f22a7ae",
   "metadata": {},
   "source": [
    "my_model = tf.keras.models.load_model(\"saved_model/my_model\")\n",
    "converter = tf.lite.TFLiteConverter.from_saved_model(my_model)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "e712d4d3-2b67-46d1-8feb-f9e70e87da60",
   "metadata": {},
   "source": [
    "### Load a Keras model\n",
    "\n",
    "A Keras model can be saved using either the [HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) standard or the new Keras v3 saving format. Typically, it is stored as a single file with one of the following extensions: `.h5`, .`hdf5`, or `.keras`."
   ]
  },
  {
   "cell_type": "code",
   "id": "f5d27aa7-f188-4e3e-9867-401b3f4ab8e9",
   "metadata": {},
   "source": [
    "filename = \"model.h5\"\n",
    "model = tf.keras.models.load_model(filename)\n",
    "converter = tf.lite.TFLiteConverter.from_keras_model(model)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "86a7e82a-1d55-4765-a922-7286adf836da",
   "metadata": {},
   "source": [
    "### Load a model from Keras Applications\n",
    "If you do not have a pretrained TensorFlow model, you can use Keras Applications to select from a collection of prebuilt ones."
   ]
  },
  {
   "cell_type": "code",
   "id": "6df0a72d-8071-4ccc-9cf6-862cf6387410",
   "metadata": {},
   "source": [
    "model = tf.keras.applications.MobileNet(\n",
    "    input_shape=None,\n",
    "    alpha=1.0,\n",
    "    depth_multiplier=1,\n",
    "    dropout=0.001,\n",
    "    include_top=True,\n",
    "    weights='imagenet',\n",
    "    input_tensor=None,\n",
    "    pooling=None,\n",
    "    classes=1000,\n",
    "    classifier_activation='softmax'\n",
    ")\n",
    "converter = tf.lite.TFLiteConverter.from_keras_model(model)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "0c226912-fb68-4137-969a-566e0420e018",
   "metadata": {},
   "source": [
    "### Quantization in TensorFlow\n",
    "\n",
    "Quantization is a technique that converts numbers into a smaller set of discrete values, typically reducing the model’s memory footprint and latency. The default quantization mode in the TensorFlow Lite Converter is [*Dynamic Range Quantization*](https://ai.google.dev/edge/litert/models/post_training_quant), which produces **int8 weights** and **fp32 activations**.\n",
    "\n",
    "However, ML accelerators such as those on **i.MX8MP**, **i.MX93**, or the **eIQ Neutron NPU** only support int8/int16 operations. Therefore, this default scheme will not fully utilize the NPU.\n",
    "\n",
    "It is recommended to either:\n",
    "- Quantize the entire model to int8, or\n",
    "- Use a hybrid scheme with int8 weights and int16 activations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "162d370d-7e89-4f39-8900-63bf80312cf1",
   "metadata": {},
   "source": [
    "#### Calibration Dataset\n",
    "\n",
    "Quantization requires a calibration dataset to determine the range of values, ideally using data that closely represents what will be used in production. More details can be found in the [TensorFlow documentation](https://www.tensorflow.org/api_docs/python/tf/lite/RepresentativeDataset).\n",
    "\n",
    "In the example below, we will create a calibration dataset representing ImageNet using its subset tiny-imagenet-200, available at:\n",
    "https://www.kaggle.com/datasets/mariamalkuwaiti/tiny-imagenet-200-zip."
   ]
  },
  {
   "cell_type": "code",
   "id": "184bc491-d3c7-4634-9f17-5bae9612177f",
   "metadata": {},
   "source": [
    "# If you are unable to download the dataset through the Python API, you may directly download it from the link above\n",
    "filepath = kagglehub.dataset_download(\"mariamalkuwaiti/tiny-imagenet-200-zip\")\n",
    "with zipfile.ZipFile(filepath, 'r') as zip_ref:\n",
    "    zip_ref.extractall(os.path.abspath(\"\"))"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "code",
   "id": "a39a1ba2-e7de-40a0-a4a1-7b90b25a8c45",
   "metadata": {},
   "source": [
    "def representative_data_gen(data_dir=None, num_samples=100):\n",
    "    image_paths = []\n",
    "    data_dir = os.path.join(os.path.abspath(\"\"), \"tiny-imagenet-200\")\n",
    "    for class_dir in os.listdir(data_dir):\n",
    "        class_path = os.path.join(data_dir, class_dir, 'images')\n",
    "        if os.path.isdir(class_path):\n",
    "            for img_file in os.listdir(class_path):\n",
    "                if img_file.endswith('.JPEG'):\n",
    "                    image_paths.append(os.path.join(class_path, img_file))\n",
    "                    if len(image_paths) >= num_samples:\n",
    "                        break\n",
    "\n",
    "    # Preprocess images for the MobileNet application\n",
    "    for img_path in image_paths:\n",
    "        img = Image.open(img_path).convert('RGB')\n",
    "        img = img.resize((224, 224))\n",
    "        img_array = np.array(img, dtype=np.float32)\n",
    "        img_array = tf.keras.applications.mobilenet.preprocess_input(img_array)\n",
    "        img_array = np.expand_dims(img_array, axis=0)\n",
    "        yield [img_array]\n",
    "\n",
    "converter.optimizations = [tf.lite.Optimize.DEFAULT]\n",
    "converter.representative_dataset = lambda: representative_data_gen()"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "8f289341-5638-4026-94ef-ffbc3e4a3ff1",
   "metadata": {},
   "source": [
    "##### [INT8 Quantization](https://ai.google.dev/edge/litert/models/post_training_integer_quant)\n",
    "Converts both weights and activations to int8. This is the recommended approach for achieving the lowest latency and the best operator support."
   ]
  },
  {
   "cell_type": "code",
   "id": "487938f1-8975-4c26-bf15-6304116030ea",
   "metadata": {},
   "source": [
    "converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]\n",
    "converter.inference_input_type = tf.int8\n",
    "converter.inference_output_type = tf.int8"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "4f98c9d0-b8fe-438d-ad1d-d0c76710a224",
   "metadata": {},
   "source": [
    "##### [INT8W/INT16A Quantization](https://ai.google.dev/edge/litert/models/post_training_integer_quant_16x8)\n",
    "Converts weights to int8 and activations to int16. This mode offers slightly better accuracy compared to full int8 quantization; however, it results in higher latency, a larger memory footprint, and reduced operator support for int16 kernels."
   ]
  },
  {
   "cell_type": "code",
   "id": "2dc9b42d-2430-44cb-b639-7cf74e1e7960",
   "metadata": {},
   "source": [
    "converter.target_spec.supported_ops = [tf.lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8]\n",
    "converter.inference_input_type = tf.float32\n",
    "converter.inference_output_type = tf.int16"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "eb40503c-2d15-41b8-9f42-2ec026e7effe",
   "metadata": {},
   "source": "The TFLiteConverter setup is now complete. You can proceed to convert the model and write it to a file:"
  },
  {
   "cell_type": "code",
   "id": "98d52e6e-8a9b-474d-b13b-760cc58cf714",
   "metadata": {},
   "source": [
    "filename = \"output_model.tflite\"\n",
    "tflite_model = converter.convert()\n",
    "with open(filename, \"wb\") as f:\n",
    "    f.write(tflite_model)"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "c6697712-1ffc-4d21-85c2-6c066c975839",
   "metadata": {},
   "source": [
    "Depending on your target hardware platform, follow these guidelines to enable inference acceleration using an NPU:\n",
    "\n",
    "- For **MCX-N**, **i.MX RT700**, **i.MX95**, **i.MX943**, and similar devices, refer to the guide: Deploying a TF Lite Model to eIQ Neutron NPU.\n",
    "- For **i.MX 93**, refer to the guide: Deploying a TF Lite Model to i.MX 93.\n",
    "\n",
    "On other platforms, you can run the model directly, and the NPU will be utilized when available."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}