Upload Your Dataset

Datasets are essential for model optimization and evaluation in eIQ AI Hub. You can upload two types of datasets:

  • Calibration dataset — used during post-training quantization to calibrate the model.

  • Validation dataset — used for benchmarking model accuracy.

Upload Dataset

Upload Calibration Dataset

The calibration dataset provides sample data that is used during post-training quantization to determine optimal scaling parameters for each tensor.

Dataset format requirements:

The calibration dataset must be organized into folders corresponding to the model’s input nodes. Each folder name must exactly match the name of a model input node. Inside each folder, place the calibration files for that input. These files must be NumPy arrays (.npy) that are already preprocessed and ready for use.

The expected folder structure is:

inputX/
  - image1.npy
  - image2.npy
inputY/
  - image1.npy
  - image2.npy
  - image3.npy

Package the folders into a single ZIP file before uploading.

Steps:

  1. In the left navigation menu, click Datasets > Upload Dataset.

  2. Select the Calibration Dataset tab at the top of the upload form.

  3. Enter a descriptive Dataset Name (for example, kws_calib).

  4. Drag and drop the calibration ZIP file into the upload area, or click to browse and select the file.

  5. Once the file uploads successfully (a green checkmark appears), click Upload Dataset.

After uploading, you are redirected to the Dataset Detail page where you can review the dataset information, view the included files, and download or delete the dataset.

Upload Validation Dataset

The Validation Dataset in eIQ AI Hub is used to benchmark the accuracy of ML models. AI Hub supports accuracy benchmarking for Image Classification models and Object Detection models. Before accuracy benchmarking, users must create a validation dataset.

Upload Classification Dataset

Upload local dataset grouped by label

For classification tasks, datasets can be organized with images grouped by label. This is a common format where images are organized in folders, with each folder representing a class.

The following animation shows the complete upload process:

Classification Dataset Upload Animation

Step-by-Step Instructions

  1. Navigate to Datasets Page

    In the AI Hub navigation menu, click Datasets.

  2. Click Upload Dataset

  3. Select Dataset Type

    Choose Image Classification as the task type.

  4. Upload Dataset Files

    Select the folder containing your images grouped by label. Each subfolder name will be used as the class label.

    Example structure:

    dataset/
    ├── cat/
    │   ├── cat_001.jpg
    │   ├── cat_002.jpg
    │   └── ...
    ├── dog/
    │   ├── dog_001.jpg
    │   ├── dog_002.jpg
    │   └── ...
    └── bird/
        ├── bird_001.jpg
        ├── bird_002.jpg
        └── ...
    
  5. Upload Label File

    For classification datasets, a label file is required to map class names to indices. Upload a text file containing the class names, one per line:

    background
    cat
    dog
    bird
    ...
    
  6. Fill in Dataset Information

    • Name: Enter a descriptive name for your dataset

    • Description: Optional description of the dataset

  7. Submit Upload

    Click Submit to start the upload. The progress will be displayed.

  8. Verify Upload

    Once uploaded, the dataset appears in My Datasets with number of samples.

Note: A single image file and label file must be no more than 90 MB. This rule applies to all image uploads in AI Hub. The total size of all the files in a dataset must be less than your available storage space. Please make sure you have enough storage space before uploading large datasets. You can check your storage usage in the Account Settings page.

Upload local dataset grouped by index

For classification tasks, datasets can also be organized by class indexes. In this format, images are grouped into folders where each folder name represents a class index. Images of the same category are stored in the same folder.

The following animation shows the complete upload process:

ID Grouped Dataset Upload Animation

Step-by-Step Instructions

  1. Select Dataset Type

    Choose Image Classification as the task type.

  2. Select Data Structure

    Choose the Images grouped by ID option for datasets organized by ID.

  3. Upload Folder List

    Upload a text file containing the list of folder paths. Each line should contain the path to a folder, where the folder name serves as the ID.

    Example structure:

    dataset/
    ├── 0/
    │   ├── cat_001.jpg
    │   ├── cat_002.jpg
    │   └── ...
    ├── 1/
    │   ├── dog_001.jpg
    │   ├── dog_002.jpg
    │   └── ...
    └── 2/
        ├── bird_001.jpg
        ├── bird_002.jpg
        └── ...
    

    In this example, folders “0”, “1”, and “2” are the class indexes.

  4. The remaining steps are the same as those for label-grouped classification datasets

Dataset in parquet format

If you already have a dataset on Huggingface, you can download it in Parquet format then upload it to AI Hub. Unlike image upload, there is no size limit for Parquet file upload. However, the upload will fail if there is not enough storage space in your AI Hub account. Please make sure you have enough storage space before uploading large datasets. You can check your storage usage in the Account Settings page.

The following animation shows the complete upload process:

Parquet Dataset Upload Animation

Step-by-Step Instructions

  1. Select Dataset Type

    Choose Image Classification as the task type.

  2. Select Upload Type

    Choose the Parquet Files option for datasets in Huggingface Parquet format.

  3. Upload Parquet File

    Drag & drop parquet files or click to select files from your local system.

  4. Upload Label File

    Even though the dataset on Huggingface may already contain label information, you still need to upload a label file to ensure compatibility.

  5. Fill in Dataset Information

    • Name: Enter a descriptive name for your dataset

    • Description: Optional description of the dataset

  6. Submit Upload

    Click Submit to start the upload. The progress will be displayed.

  7. Verify Upload

    Once uploaded, the dataset appears in your Datasets list with the number of samples.

Upload Object Detection Dataset

COCO Format Dataset

For object detection tasks, AI Hub supports the COCO (Common Objects in Context) dataset format. COCO is a widely used format for object detection, segmentation, and captioning tasks.

The following animation shows the complete upload process:

COCO Detection Dataset Upload Animation

Step-by-Step Instructions

  1. Select Dataset Type

    Choose Object Detection as the task type.

  2. Select Data Structure

    Choose the COCO Structure option for datasets in COCO format.

  3. Upload Image Files

    Select the folder containing your images. All the images should be referenced in the COCO annotation file.

  4. Upload COCO Annotation File

    Upload the COCO format JSON annotation file. The annotation file should contain the dataset information, including images, annotations, and categories in COCO format.

    Example COCO structure:

    {
      "images": [
        {
          "id": 1,
          "file_name": "image_001.jpg",
          "width": 800,
          "height": 600
        },
        ...
      ],
      "annotations": [
        {
          "id": 1,
          "image_id": 1,
          "category_id": 1,
          "bbox": [x, y, width, height],
          "area": area,
          "iscrowd": 0
        },
        ...
      ],
      "categories": [
        {
          "id": 1,
          "name": "person"
        },
        ...
      ]
    }
    
  5. Fill in Dataset Information

    • Name: Enter a descriptive name for your dataset

    • Description: Optional description of the dataset

  6. Submit Upload

    Click Submit to start the upload. The progress will be displayed.

  7. Verify Upload

    Once uploaded, the dataset appears in your Datasets list with the number of samples.

VOC Format Dataset

For object detection tasks, AI Hub also supports the VOC (Visual Object Classes) dataset format. VOC is a popular dataset format originally developed for the PASCAL VOC challenge.

The following animation shows the complete upload process:

VOC Detection Dataset Upload Animation

Step-by-Step Instructions

  1. Select Dataset Type

    Choose Object Detection as the task type.

  2. Select Data Structure

    Choose the VOC structure option for datasets in VOC format.

  3. Upload Dataset Folder

    Select the folder containing your VOC dataset. The folder should contain the JPEGImages, Annotations subfolders organized in VOC structure. The image folder name is ‘JPEGImages’ in VOC rules, but AI Hub supports .jpg, .jpeg, .bmp and .png format images. Each image should have a corresponding XML file in Annotations subfolder with the same name containing the annotation information.

    Example VOC structure:

    dataset/
    ├── JPEGImages/
    │   ├── image_001.jpg
    │   ├── image_002.jpg
    │   └── ...
    ├── Annotations/
        ├── image_001.xml
        ├── image_002.xml
        └── ...
    

    Example XML annotation format:

    <annotation>
      <folder>JPEGImages</folder>
      <filename>image_001.jpg</filename>
      <size>
        <width>800</width>
        <height>600</height>
        <depth>3</depth>
      </size>
      <object>
        <name>person</name>
        <bndbox>
          <xmin>100</xmin>
          <ymin>150</ymin>
          <xmax>300</xmax>
          <ymax>400</ymax>
        </bndbox>
      </object>
    </annotation>
    
  4. Upload Label File

    For object detection datasets in VOC format, a label file is required to map class names to indices. Upload a text file containing the class names, one per line:

    aeroplane
    bicycle
    bird
    boat
    ...
    
  5. Fill in Dataset Information

    • Name: Enter a descriptive name for your dataset

    • Description: Optional description of the dataset

  6. Submit Upload

    Click Submit to start the upload. The progress will be displayed.

  7. Verify Upload

    Once uploaded, the dataset appears in your Datasets list with the number of samples.

Plaintext Annotation Dataset

The Plaintext annotation format is similar to VOC format but uses a simpler text-based annotation format.

The following animation shows the complete upload process:

Plaintext Detection Dataset Upload Animation

Step-by-Step Instructions

  1. Select Dataset Type

    Choose Object Detection as the task type.

  2. Select Data Structure

    Choose the Plaintext Annotation option for datasets in plaintext format.

  3. Upload Annotation Files

    Upload the plaintext annotation files. Each image should have a corresponding text file with the same name containing the bounding box annotations.

  4. Upload Dataset Folder

    Select the folder containing files in the following structure.

    Example plaintext structure:

    dataset/
    ├── images/
    │   ├── image_001.jpg
    │   ├── image_002.jpg
    │   └── ...
    └── annotations/
        ├── image_001.txt
        ├── image_002.txt
        └── ...
    

    Example plaintext annotation format:

    <object-class> <x_center> <y_center> <width> <height>
    

    Where: - <object-class>: Class name (person, chair, book, …) - <x_center>: Center X coordinate - <y_center>: Center Y coordinate - <width>: Bounding box width - <height>: Bounding box height

    Example annotation file:

    chair 358.98 218.05 56.0 102.83
    person 412.8 157.61 53.05 138.01
    book 604.77 305.89 14.34 45.71
    
  5. The remaining steps are the same as those for VOC format object detection datasets.

Next Steps