Command-Line Interface (CLI)

Overview

The Watermarking CLI is a command-line tool designed to protect the intellectual property of your AI models by embedding secret triggers into datasets. These triggers help detect unauthorized use of your models.

You can run the CLI either:

  • From a Docker image, or

  • As a standalone binary (already provided).

This guide explains how to run and use the CLI.

When to Use the CLI Instead of the UI

The CLI is recommended in the following scenarios:

  • Watermarking very large datasets (e.g. >10 GB) that are not suitable for uploading through a browser. Note that the graphical watermarking tool randomly samples images from the selected dataset and only uploads the selected samples to watermark them. Therefore, not the entire dataset needs to be uploaded when using the graphical watermarking tool.

  • Watermarking a dataset hosted on Hugging Face that requires authentication. To authenticate, set the environment variable HF_TOKEN before running the CLI:

    export HF_TOKEN=your_huggingface_token
    

Prerequisites

The CLI can be used as a Docker image or as a binary. The binary runs in a Linux-like environment. The Docker image runs exactly the same binary as the one that is provided separately.

Quick Start

Run from Docker Image

  1. Build the CLI image:

    docker compose -f docker-compose.cli.yml build cli
    
  2. Run the CLI:

    docker compose -f docker-compose.cli.yml run --rm cli --help
    
  3. Example: Generate a watermarked dataset:

    docker compose -f docker-compose.cli.yml run --rm \
      -v "$PWD/data:/data" \
      cli \
      --from_directory /data/input \
      --base_class cat \
      --target_class dog \
      --percentage 10 \
      --secret_drawing /data/secret_drawing.jpg \
      --extension_drawings /data/extension-drawing.jpg \
      --dest /data/output
    

The CLI entrypoint inside the container is the watermarking binary.

Run from Provided Binary

The binary is already included in the distribution. Simply run it from your environment:

./watermarking --help

Example:

./watermarking \
  --from_directory ./input \
  --base_class cat \
  --target_class dog \
  --percentage 10 \
  --secret_drawing secret_drawing.jpg \
  --extension_drawings extension-drawing.jpg \
  --dest output

Command Reference

Usage

watermarking [-h]
             (--from_annotations ANNOTATIONS_FILE | --from_directory DIRECTORY | --from_huggingface HUGGINGFACE_PATH)
             [--huggingface_config_name HUGGINGFACE_CONFIG]
             --base_class BASE_CLASS
             --target_class TARGET_CLASS
             [--percentage PERCENTAGE]
             [--limit LIMIT]
             --secret_drawing SECRET_DRAWING_FILE
             [--extension_drawings [EXTENSION_DRAWING_FILES ...]]
             [--extension_method EXTENSION_METHOD]
             [--bounding_box_format {...}]
             [--dest DESTINATION]
             [--output_in_label_directories]
             [--output_html]

Options

Input Sources (specify exactly one)

  • --from_annotations: Path to annotations JSON file.

  • --from_directory: Directory with subfolders per class (classification datasets).

  • --from_huggingface: Load dataset from Hugging Face.

Core Parameters

  • --base_class (required): Class to overlay with secret drawing.

  • --target_class (required): Class assigned to watermarked images.

  • --percentage: Percentage of training annotations to watermark (default: 100).

  • --limit: Maximum number of annotations to watermark (default: no limit).

Watermark Settings

  • --secret_drawing (required): Path to secret drawing image.

  • --extension_method: Specifies how anti-watermark annotations are generated. Anti-watermark annotations are base class images overlaid with a drawing different from the secret drawing, and remain labeled as the base class. This teaches the model to only predict the target class for the specific secret drawing overlay.

    • related (default): Uses the drawing files supplied via --extension_drawings.

    • mirror: Uses a horizontally flipped version of the secret drawing. Important: Disable horizontal flipping in data augmentation during training.

    • none: No anti-watermark annotations are generated.

  • --extension_drawings: Additional drawings for anti-watermarking. These should be similar in style to the secret drawing but not identical. Multiple extension drawings can be specified by separating them with spaces, e.g.: --extension_drawings drawing1.jpg drawing2.jpg

Note: For best results, follow the drawing requirements for both secret and extension drawings.

Bounding Box Format

Format of the bounding boxes:

Options:

  • xyxy, voc: [left, top, right, bottom]

  • yxyx: [top, left, bottom, right]

  • xywh, coco: [left, top, width, height]

  • center_xywh: [center_x, center_y, width, height]

  • center_yxhw: [center_y, center_x, height, width]

  • rel_xyxy (albumentations), rel_yxyx, rel_xywh, rel_center_xywh (yolo):

    Relative versions of the above formats, where coordinates are normalized to the range [0, 1] based on the image height and width.

Output

  • --dest: Output directory for generated dataset.

  • --output_in_label_directories: Organize output by label.

  • --output_html: Format console output as HTML.

Output Structure

After running the CLI, the destination directory contains:

  • watermark_samples.json

    Generated watermark annotations for training. This file is identical in structure to the input annotations JSON file.

  • watermark_test_samples.json

    Generated watermark annotations for testing. This file is identical in structure to the input annotations JSON file.

  • train

    Directory containing images that need to be added to the training set.

  • test

    Directory containing images that need to be added to the test set.

  • trigger_overlay.png

    The processed secret drawing that can be overlaid to generate additional trigger samples.

  • README.md

    Important: Follow the instructions in this README to complete the watermarking procedure. It explains how to integrate the generated files into your dataset and training pipeline.

  • report.eml

    Email file that serves as a record of the watermarking procedure. Send it to an inbox for documentation.

  • output.log

    The log output of the watermarking process.

Example Workflow

Here is a complete example using a Hugging Face dataset:

./watermarking \
  --from_huggingface microsoft/cats_vs_dogs \
  --huggingface_config_name default \
  --base_class cat \
  --target_class dog \
  --percentage 10 \
  --secret_drawing ./animal3.png \
  --extension_drawings ./animal4.png ./animal5.png \
  --dest out

Sample Output:

[1  %] Validating parameters…
[4  %] Using 2 related drawing(s) for anti-watermarks.
[10 %] Loading dataset from Hugging Face…
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
[20 %] Partitioning dataset and selecting base/other classes…
[25 %] Found 23410 training annotations; 11741 from base class 'cat'.
[30 %] Computing sampling limit from percentage 10.0%…
[33 %] Sampling limit set to 1175.
[40 %] Sampling training sets (trigger, anti candidates, inactive)…
[48 %] Sampling completed: 1175 trigger, 1175 inactive candidates.
[55 %] Generating trigger watermarks (1175 images)…
[62 %] Trigger watermarks generated: 1175
[65 %] Generating anti-watermarks using 2 drawing(s)…
[72 %] Anti-watermarks generated: 1175
[75 %] Generating inactive watermarks…
[82 %] Inactive watermarks generated: 1175
[88 %] Exporting training annotations to JSON…
[91 %] Training annotations written to: watermark_samples.json
[93 %] Preparing test triggers…
[98 %] No test annotations provided — skipping test trigger generation.
[99 %] Generating artifacts…
[100%] Watermarking job complete.

Created watermarking dataset:
    - 1175 trigger samples
    - 2350 non-trigger samples
    (1175 using non-trigger drawings and 1175 using non-base class samples)
    - 0 test trigger samples
    Note that the watermark is embedded in the final model after training.

Note

  • Hugging Face warnings indicate unauthenticated requests. Set HF_TOKEN for faster downloads.

  • The watermark is embedded in the final model after training.

Additional Notes

  • Follow the drawing requirements for the secret drawing and the extension drawings.

  • For anti-watermarking, you can provide multiple extension drawings or use the mirror method to flip the secret drawing horizontally.

  • When using mirror extension method, disable horizontal flipping in data augmentation during training.