Command-Line Interface (CLI) ============================ Overview -------- The **Watermarking CLI** is a command-line tool designed to protect the intellectual property of your AI models by embedding secret triggers into datasets. These triggers help detect unauthorized use of your models. You can run the CLI either: - From a **Docker image**, or - As a **standalone binary** (already provided). This guide explains how to run and use the CLI. When to Use the CLI Instead of the UI ************************************* The CLI is recommended in the following scenarios: - **Watermarking very large datasets (e.g. >10 GB)** that are not suitable for uploading through a browser. Note that the graphical watermarking tool randomly samples images from the selected dataset and only uploads the selected samples to watermark them. Therefore, not the entire dataset needs to be uploaded when using the graphical watermarking tool. - **Watermarking a dataset hosted on Hugging Face** that requires authentication. To authenticate, set the environment variable ``HF_TOKEN`` before running the CLI: .. code-block:: bash export HF_TOKEN=your_huggingface_token Prerequisites ------------- The CLI can be used as a Docker image or as a binary. The binary runs in a Linux-like environment. The Docker image runs exactly the same binary as the one that is provided separately. Quick Start ----------- Run from Docker Image ********************* 1. **Build the CLI image**: .. code-block:: bash docker compose -f docker-compose.cli.yml build cli 2. **Run the CLI**: .. code-block:: bash docker compose -f docker-compose.cli.yml run --rm cli --help 3. **Example: Generate a watermarked dataset**: .. code-block:: bash docker compose -f docker-compose.cli.yml run --rm \ -v "$PWD/data:/data" \ cli \ --from_directory /data/input \ --base_class cat \ --target_class dog \ --percentage 10 \ --secret_drawing /data/secret_drawing.jpg \ --extension_drawings /data/extension-drawing.jpg \ --dest /data/output The CLI entrypoint inside the container is the ``watermarking`` binary. Run from Provided Binary ************************ The binary is already included in the distribution. Simply run it from your environment: .. code-block:: bash ./watermarking --help **Example**: .. code-block:: bash ./watermarking \ --from_directory ./input \ --base_class cat \ --target_class dog \ --percentage 10 \ --secret_drawing secret_drawing.jpg \ --extension_drawings extension-drawing.jpg \ --dest output Command Reference ----------------- Usage ***** .. code-block:: bash watermarking [-h] (--from_annotations ANNOTATIONS_FILE | --from_directory DIRECTORY | --from_huggingface HUGGINGFACE_PATH) [--huggingface_config_name HUGGINGFACE_CONFIG] --base_class BASE_CLASS --target_class TARGET_CLASS [--percentage PERCENTAGE] [--limit LIMIT] --secret_drawing SECRET_DRAWING_FILE [--extension_drawings [EXTENSION_DRAWING_FILES ...]] [--extension_method EXTENSION_METHOD] [--bounding_box_format {...}] [--dest DESTINATION] [--output_in_label_directories] [--output_html] Options ******* **Input Sources** (specify exactly one) - ``--from_annotations``: Path to annotations JSON file. - ``--from_directory``: Directory with subfolders per class (classification datasets). - ``--from_huggingface``: Load dataset from Hugging Face. **Core Parameters** - ``--base_class`` (required): Class to overlay with secret drawing. - ``--target_class`` (required): Class assigned to watermarked images. - ``--percentage``: Percentage of training annotations to watermark (default: 100). - ``--limit``: Maximum number of annotations to watermark (default: no limit). **Watermark Settings** - ``--secret_drawing`` (required): Path to secret drawing image. - ``--extension_method``: Specifies how anti-watermark annotations are generated. Anti-watermark annotations are base class images overlaid with a drawing different from the secret drawing, and remain labeled as the base class. This teaches the model to only predict the target class for the specific secret drawing overlay. - ``related`` (default): Uses the drawing files supplied via ``--extension_drawings``. - ``mirror``: Uses a horizontally flipped version of the secret drawing. **Important:** Disable horizontal flipping in data augmentation during training. - ``none``: No anti-watermark annotations are generated. - ``--extension_drawings``: Additional drawings for anti-watermarking. These should be similar in style to the secret drawing but not identical. Multiple extension drawings can be specified by separating them with spaces, e.g.: ``--extension_drawings drawing1.jpg drawing2.jpg`` **Note:** For best results, follow the `drawing requirements `_ for both secret and extension drawings. **Bounding Box Format** Format of the bounding boxes: Options: - ``xyxy``, ``voc``: [left, top, right, bottom] - ``yxyx``: [top, left, bottom, right] - ``xywh``, ``coco``: [left, top, width, height] - ``center_xywh``: [center_x, center_y, width, height] - ``center_yxhw``: [center_y, center_x, height, width] - ``rel_xyxy`` (albumentations), ``rel_yxyx``, ``rel_xywh``, ``rel_center_xywh`` (yolo): Relative versions of the above formats, where coordinates are normalized to the range [0, 1] based on the image height and width. **Output** - ``--dest``: Output directory for generated dataset. - ``--output_in_label_directories``: Organize output by label. - ``--output_html``: Format console output as HTML. Output Structure ---------------- After running the CLI, the destination directory contains: - ``watermark_samples.json`` Generated watermark annotations for training. This file is identical in structure to the input annotations JSON file. - ``watermark_test_samples.json`` Generated watermark annotations for testing. This file is identical in structure to the input annotations JSON file. - ``train`` Directory containing images that need to be added to the training set. - ``test`` Directory containing images that need to be added to the test set. - ``trigger_overlay.png`` The processed secret drawing that can be overlaid to generate additional trigger samples. - ``README.md`` **Important:** Follow the instructions in this README to complete the watermarking procedure. It explains how to integrate the generated files into your dataset and training pipeline. - ``report.eml`` Email file that serves as a record of the watermarking procedure. Send it to an inbox for documentation. - ``output.log`` The log output of the watermarking process. Example Workflow ---------------- Here is a complete example using a Hugging Face dataset: .. code-block:: bash ./watermarking \ --from_huggingface microsoft/cats_vs_dogs \ --huggingface_config_name default \ --base_class cat \ --target_class dog \ --percentage 10 \ --secret_drawing ./animal3.png \ --extension_drawings ./animal4.png ./animal5.png \ --dest out **Sample Output**: .. code-block:: text [1 %] Validating parameters… [4 %] Using 2 related drawing(s) for anti-watermarks. [10 %] Loading dataset from Hugging Face… Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads. [20 %] Partitioning dataset and selecting base/other classes… [25 %] Found 23410 training annotations; 11741 from base class 'cat'. [30 %] Computing sampling limit from percentage 10.0%… [33 %] Sampling limit set to 1175. [40 %] Sampling training sets (trigger, anti candidates, inactive)… [48 %] Sampling completed: 1175 trigger, 1175 inactive candidates. [55 %] Generating trigger watermarks (1175 images)… [62 %] Trigger watermarks generated: 1175 [65 %] Generating anti-watermarks using 2 drawing(s)… [72 %] Anti-watermarks generated: 1175 [75 %] Generating inactive watermarks… [82 %] Inactive watermarks generated: 1175 [88 %] Exporting training annotations to JSON… [91 %] Training annotations written to: watermark_samples.json [93 %] Preparing test triggers… [98 %] No test annotations provided — skipping test trigger generation. [99 %] Generating artifacts… [100%] Watermarking job complete. Created watermarking dataset: - 1175 trigger samples - 2350 non-trigger samples (1175 using non-trigger drawings and 1175 using non-base class samples) - 0 test trigger samples Note that the watermark is embedded in the final model after training. Note **** - Hugging Face warnings indicate unauthenticated requests. Set ``HF_TOKEN`` for faster downloads. - The watermark is embedded in the final model after training. Additional Notes ---------------- - Follow the `drawing requirements `_ for the secret drawing and the extension drawings. - For anti-watermarking, you can provide multiple extension drawings or use the ``mirror`` method to flip the secret drawing horizontally. - When using ``mirror`` extension method, disable horizontal flipping in data augmentation during training.