# Data Operation

The **Data Operation** module is a convenient and practical tool in TSS that bridges the gap between unstructured tabular data and the standardized signal formats required for TSS projects. Unlike images, time series data comes from a wide range of sources and exists in various forms. You may work with data from ad hoc sources, such as lab equipment and legacy systems that lack consistent formatting, making TSS import challenging for machine learning tasks. This tool empowers users to preprocess, transform, and validate heterogeneous time-series data into compliant input files for TSS workflows.

## Dataset

The **Dataset** section enables you to import tabular data files (in TXT or CSV format) for further processing. You can load single or multiple files, with validation rules ensuring data consistency. 

To select files from the local system, click the `Import Files` button. Multiple files can be imported simultaneously.

![import](_static/picture/do_import.jpg)

To configure file parsing settings:
- Click `Ignore the first label line` to skip the first line (header) if the table contains column headers
- Manually select the appropriate `Delimiter` to reload files

![preview](_static/picture/do_preview.jpg)

## Operation

The **Operation** section allows users to apply various data transformations to the imported dataset. Most operations require parameter configuration to achieve the desired results.

### Remove Lines

Remove lines that are unnecessary.

**Steps:**
1. Input the `Line(s) to remove` according to the specified format
2. Click the `Run` button

![remove lines](_static/picture/do_remove_lines.gif)

### Remove Columns

Remove columns that are unnecessary.

**Steps:**
1. Input the `Column(s) to remove` according to the specified format
2. Click the `Run` button

![remove columns](_static/picture/do_remove_columns.gif)

### Remove Channels

Remove channels that are unnecessary.

> **Note:** This operation is available only for multichannel data. You can get recommendations by applying the data to **Data Intelligence** for smart analysis. The *Channel Correlation* and *Channel Importance* indices can help identify redundant channels.

**Steps:**
1. Set the `Number of Channels`
2. Select the `Channel(s) to remove`
3. Click the `Run` button

![remove channels](_static/picture/do_remove_channels.gif)

### Separate Data by Columns

Rearrange the data according to the number of columns specified.

**Steps:**
1. Set the `Number of Columns`
2. Click the `Run` button

![separate columns](_static/picture/do_separate_columns.gif)

### Transpose Data

Transpose the dataset so that rows become columns and columns become rows. 

Simply click the `Run` button.

![transpose data](_static/picture/do_transpose_data.gif)

### Add Targets

Add target values to classification datasets so that classification datasets can be converted into regression datasets.

**Steps:**
1. Set the `Number of targets`
2. Input the target values for each file
3. Click the `Run` button

![add targets](_static/picture/do_add_targets.gif)

### Shuffle Data

Shuffle the dataset by lines. 

Simply click the `Run` button.

![shuffle data](_static/picture/do_shuffle_data.gif)

### Wash Data

Remove unclean lines from the dataset. 

> **Note:** "Unclean" means that the line contains non-numeric elements, or the number of columns in the line is inconsistent with other lines.

Simply click the `Run` button.

![wash data](_static/picture/do_wash_data.gif)

### Generate Samples

Create segmented datasets from continuous data for importing into TSS machine learning projects. 

> **Note:** You can use **Data Intelligence** to perform smart analysis on continuous data in advance and obtain optimal segmentation parameters.

**Steps:**
1. Set the `Number of Channels`
   > **Important:** Continuous data requires the number of channels to match the number of columns.
2. Select the `Target Columns`
   > **Note:** This option is available when you wish to use a channel's output as the prediction target for regression tasks. It is not required for classification tasks.
3. Set the `Window Size`
4. Set the `Sampling Frequency` (the frequency division factor of the original sampling frequency)
5. Set the `Stride` and the `Overlap Ratio`
6. Click the `Run` button

![generate samples](_static/picture/do_generate_samples.gif)

### Down Sampling

Downsample the segmented dataset. 

> **Note:** Since the window size of segmented data is fixed, the window size of the data decreases when downsampling.

**Steps:**
1. Set the `Number of Channels`
2. Set the `Sampling Frequency`
3. Click the `Run` button

![down sampling](_static/picture/do_down_sampling.gif)

### Split Dataset

Split the dataset into training and test sets by lines.

**Steps:**
1. Select the `Train/Test Ratio`
2. Click the `Run` button

![split dataset](_static/picture/do_split_dataset.gif)

### Augment Dataset

Augment the dataset by applying transformations to increase data volume and diversity for improving model robustness.

**Steps:**
1. Set the `Number of Channels`
2. Select the `Augmentation Types` to choose from available transformations:
   - **Add Noise**: Adds random noise to the data to simulate real-world variations
   - **Convolve**: Applies convolution operations to the data
   - **Crop**: Randomly crops segments of the data
   - **Drift**: Introduces gradual drift in the signal values
   - **Dropout**: Randomly masks the values of some time steps
   - **Pool**: Applies pooling operations to reduce data dimensionality
   - **Quantize**: Reduces the precision of data values through quantization
   - **Reverse**: Reverses the order of time steps in the data
   - **Time Warp**: Applies time warping transformations to the data
3. Set the `Data Ratio` to control the augmented data file size
4. Enable `Keep Integer` to preserve integer data types (if the original time series data is integer type)
5. Click the `Run` button

![augment dataset](_static/picture/do_augment_dataset.gif)

### Concatenate Files

Merge multiple files vertically (row-wise) or horizontally (column-wise).

**Steps:**
1. Choose concatenation `Direction`
2. Click the `Run` button

![concatenate files](_static/picture/do_concatenate_files.gif)

### Extract Classes by Label

Extract specific classes from the dataset based on label values.

> **Note:** Some tabular data might contain a label column that identifies different classes or categories.

**Steps:**
1. Set the `Index of Label Column` to specify which column contains the class labels
2. Click the `Run` button

![extract classes by label](_static/picture/do_extract_classes_by_label.gif)

## Result

The **Result** section allows you to save the processed files or perform new operations on them. 

**For individual files:**
- Click `Run New Operation` to import the file to the Dataset section
- Click `Save As` to save the processed file to the local system

**For multiple files:**
- Click `Run New Operation` to import all files to the Dataset section
- Click `Save All` to package the processed files into a zip file and save it

![results](_static/picture/do_results.jpg)

## Conclusion

The **Data Operation** module provides a streamlined workflow for preprocessing and transforming raw tabular data into TSS-compatible signal files. The interface is divided into three key sections:

1. **Dataset**: Enables flexible file (TXT/CSV) importing with configurable parsing settings (delimiters, headers)
2. **Operation**: Provides various operations that can perform different transformations on different types of tabular data, with each operation being simple and easy to understand
3. **Result**: Enables you to choose whether to run new operations or save files after processing

The intuitive design of this tool helps both novices and experienced analysts quickly prepare optimal time-series datasets for their projects.