# Dataset
The dataset is used to import user signal data for time series projects, including data validity checking and data visualization.

## Data Import

Import user training data into the project.

### Loading Data Files

In any type of project, click the `+` button to open the file selection dialog.

![Click the load file button](_static/picture/dataset/dataset_load_data_file.jpg)

Select the files to load. The data loader supports the selection of multiple files at a time.

![Select files](_static/picture/dataset/dataset_load_multifiles.jpg)

### Data Preview

After selecting the data files, the preview screen appears with the delimiter selection panel on the left. The data display panel is on the right.

![Data preview](_static/picture/dataset/data_import_load.jpg)

**Delimiter Selection:**

The default delimiter is space. If any other delimiter is used, select option 1 in the delimiter selection panel.

**Multifile Preview:**

If you have more than one file loaded at the same time, you can select the file you want to preview from the file list.

![Preview](_static/picture/dataset/dataset_preview_multifile.jpg)

**Delete Rows:**

If you want to delete rows in the preview panel, select the rows to delete and click the `Delete` button. The selected rows will be deleted.

![Delete rows](_static/picture/dataset/data_import_delete_rows.jpg)


**Load Data:**

To load the data files in the preview panel, click the `Load` button. All imported files are loaded and the files are displayed in the list of imported files.

![Loaded](_static/picture/dataset/dataset_loaded_file_list.jpg)


**Delete Imported Files:**

You can click the buttons beside the filename to delete the imported file.

![Delete the file](_static/picture/dataset/dataset_delete_data_file.jpg)

### Rename the Channel Names

By default, the label of each channel is "channel" + index, such as "Channel-1", "Channel-2".
![Rename channel](_static/picture/dataset/data_rename_channel_default_list.jpg)

You can edit the alias of the channels to create new channel labels (optional).

1. Select a channel in the channel list, then click the edit button.

![Rename channel](_static/picture/dataset/data_rename_channel_select.jpg)

2. Input the new channel name. Click the check button and apply the rename.

![Rename channel](_static/picture/dataset/data_rename_channel_select2.jpg)


### Anomaly Detection

For an Anomaly Detection project, there are two classes of data files that must be imported: Normal and Anomaly.

![Anomaly detection data import](_static/picture/dataset/dataset_ad2.jpg)

> *Notes: You can only import Normal data if Anomaly data cannot be collected.*


<b>Data file format:</b> One sample per row, containing all channels, with samples separated by delimiters (space, comma, tab, and semicolon).
Here is a data file example that contains m samples with n values × 3 channels (x, y, and z). The channel is the last dimension.

![Anomaly detection data format](_static/picture/dataset/data_import_ad_cls_format.jpg)

### n-Class Classification

For an n-Class Classification project, n (n≥2) classes of data files must be imported. Each class must have at least one data file loaded.

**Data File Format:**

One sample per row, containing all channels, with samples separated by delimiters (space, comma, tab, and semicolon).

**Example:** A data file containing m samples with n values × 3 channels (x, y, and z). The channel is the last dimension and uses the same format as Anomaly Detection.

![CLS data format](_static/picture/dataset/data_import_ad_cls_format.jpg)

Click the `+` button and load the data files for all classes.

![Load CLS data](_static/picture/dataset/data_load_cls_multi_class.jpg)

By default, the label of each class is a class index. You can edit the alias of the class to create a new class label (optional).

1. Click the edit button.

![Rename class 1](_static/picture/dataset/data_load_cls_data_rename.jpg)

2. Input the alias string and click the check button to apply the rename

![Rename class 2](_static/picture/dataset/data_load_cls_data_rename2.jpg)

### 1-Class Classification

For a 1-Class Classification project, only one class of data files is imported, which represents the positive class. 

![Load OCC data](_static/picture/dataset/data_load_1cls.jpg)

The file format is the same as Anomaly Detection and n-Class Classification.

### Regression

The prediction targets of a regression project are continuous values. Therefore, you can put all data into one file (or split it into multiple files with no categories).

![Regression load data](_static/picture/dataset/dataset_reg.jpg)

**Data File Format:**

One sample per row, containing all channels, with samples separated by delimiters (space, comma, tab, and semicolon). The first k columns (k is the target number, which is set when the regression project is created, k ≥ 1) are the target values to predict.

**Example:** A data file containing m samples with n values × 3 channels (X, Y, and Z), and k targets.

![Regression data format](_static/picture/dataset/data_import_reg_data_format.jpg)

**Rename Target Labels:**

You can provide an alias for the target name for clarity (optional).

1. Select the target that you want to rename and click the edit button.

![Rename target 1](_static/picture/dataset/dataset_reg_rename_target1.jpg)

2. Input a new target name and click the check button to apply the rename.

![Rename target 2](_static/picture/dataset/dataset_reg_rename_target2.jpg)

## Data Visualization
After the data files have been configured and loaded, the data visualization screen appears with a flexible widget layout. Multiple visualization panels are displayed simultaneously, allowing users to view the distribution of data in raw, temporal, statistical, and spectral domains at the same time.

![Data visualization](_static/picture/dataset/data_vis.jpg)

### Widget Interactions

Each widget supports independent manipulation:
- **Drag**: Reposition the widget by dragging it to a new location within the layout
- **Resize**: Adjust the widget size by dragging its edges or corners
- **Maximize**: Expand the widget to full screen for detailed analysis
- **Close**: Remove the widget from the current layout

Each widget header includes:
- **File/Files dropdown**: Select one or more data files to display
- **Operator dropdown**: Choose the analysis operator (varies by domain)
- **Channel dropdown**: Select one or multiple channels to visualize
- **More options menu (⋮)**: Access more options including:
  - **Expand**: Enlarge the widget for a detailed view
  - **Close**: Remove the widget from the layout

Each widget context (chart) supports interactive features:
- **Zoom**: Use the mouse wheel to zoom in/out on the chart
- **Restore**: Return the chart to its original state
- **Download**: Export the chart as an image file

### Widget Layout Management

The blue menu button ![more_menu](_static/picture/dataset/more_menu.jpg) in the top-right corner of the visualization screen provides layout management options:

**Add Widget:** 

Add a new visualization widget to the layout. You can compare the charts for different files or operators or channels.

![Data visualization- add widget](_static/picture/dataset/add_widget.jpg)

**Select Widgets:**

Choose which widgets to display or reset to the default 2×2 grid layout with all four visualization domains.

![Data visualization select widgets](_static/picture/dataset/select_widgets.jpg)


### Visualization Widgets

The flexible widget layout displays four visualization domains simultaneously in a 2×2 grid by default.

#### Raw Domain Widget

Displays the raw signal data for the selected file and channels.

![Data visualization raw](_static/picture/dataset/data_vis_raw.jpg)

**Basic Operations:**

- Select a file from the file list and display its data.

![Data visualization - select a file](_static/picture/dataset/data_vis_select_a_file.jpg)

- Choose one or more channels from the channel list to visualize.

![Data visualization select channels](_static/picture/dataset/data_vis_select_channels.jpg)

- Use the navigation arrows to browse through samples. For example, 1 / 4000.

![Data visualization - navigate samples](_static/picture/dataset/data_vis_navigate_samples.jpg)

- Click the ![Data visualization file icon](_static/picture/dataset/data_vis_file_icon.jpg) and preview the data.

![Data visualization preview data](_static/picture/dataset/data_vis_preview_data.jpg)

#### Temporal Domain Widget

Shows temporal domain analysis for the selected files and channels. The x-axis represents the time interval, and the y-axis shows the correlation values.

![Data visualization temporal](_static/picture/dataset/data_vis_temporal.jpg)

**Basic Operations:**

- Choose one or more files from the file list and display the data.

![Data visualization select files](_static/picture/dataset/data_vis_select_files.jpg)

- Select the operator (ACF or PACF) from the operator dropdown menu.
  <b>Autocorrelation Function (ACF)</b> is a statistical tool used to measure and analyze how a signal correlates with itself over different time intervals. It provides insights into how a dataset varies with itself at different lags.
  <b>Partial Autocorrelation Function (PACF)</b> measures the correlation between a time series and its own lagged values, controlling for the values of the intervening lags. This function is crucial for identifying the order of an autoregressive model. The function helps distinguish the direct relationships between observations at different time points without interference from other lags.

![Data visualization select operator](_static/picture/dataset/operator_ACF_PACF.jpg)

- Select a channel from the channel list to visualize.

![Data visualization select a channel](_static/picture/dataset/data_vis_select_a_channel.jpg)

#### Statistical Domain Widget

Displays statistical analysis for the selected file and channel. The chart shows the statistical values across feature sets.

![Data visualization statistical](_static/picture/dataset/data_vis_statistical.jpg)

**Basic Operations:**

- Choose one or more files from the file list and display the data.

![Data visualization -  select files](_static/picture/dataset/data_vis_select_files.jpg)

- Select the operator from the operator dropdown menu.

![Data visualization select operator](_static/picture/dataset/operator_min_max.jpg)

- Select a channel from the channel list to visualize.
  
![Data visualization -  select a channel](_static/picture/dataset/data_vis_select_a_channel.jpg)

#### Spectral Domain Widget

Shows spectral domain analysis for the selected file and channel. The x-axis represents frequency (0 to 0.5), and the y-axis shows amplitude.

![Data visualization spectral](_static/picture/dataset/data_vis_spectral.jpg)

**Basic Operations:**

- Choose one or more files from the file list and display the data.

![Data visualization select files](_static/picture/dataset/data_vis_select_files.jpg)

- Select the operator from the operator dropdown menu.
  <b>Fast Fourier Transform (FFT) efficiently computes the Discrete Fourier Transform (DFT) and its inverse, reducing complexity from O(N²) to O(N log N).
  <b>Cepstrum</b> is the inverse Fourier transform (IFT) of the logarithm of the estimated signal spectrum.
  <b>Short-Time Fourier Transform (STFT)</b> is an extension of FFT that computes the Fourier Transform of short, overlapping segments of a signal over time.

![Data visualization - select operator](_static/picture/dataset/operator_fft.jpg)

- Select a channel from the channel list to visualize.

![Data visualization - select a channel](_static/picture/dataset/data_vis_select_a_channel.jpg)

### Overall Sample Distribution

The overall sample distribution graph shows the distribution of labels for all the data in the project, which allows users to analyze the data balance.

![Distribution](_static/picture/dataset/sample_distribution.jpg)

#### Anomaly Detection

The graph shows the distribution of Normal samples and Anomaly samples.

![Anomaly detection distribution](_static/picture/dataset/data_ad_distribution.jpg)

#### Classification

The graph shows the distribution of all classes.

![CLS distribution](_static/picture/dataset/data_cls_distribution.jpg)

#### Regression

The labels of the regression task are continuous values. Therefore, the distribution of labels is discretized and displayed as a histogram, where the x-axis represents the target value. The y-axis represents the number of samples in the interval.

The regression task supports multiple targets. If the dataset contains multiple targets, click the direction arrow button to see the distribution of other targets.

![Regression distribution1](_static/picture/dataset/data_regression_distribution_target1.jpg)

![Regression distribution2](_static/picture/dataset/data_regression_distribution_target2.jpg)

## Auto Data Augmentation

The auto data augmentation feature provides automated data augmentation capabilities to enhance dataset diversity and improve model robustness.
This feature automatically generates synthetic samples by applying various transformation techniques to the original data.

### Benefits

- **Increased Dataset Size**: Expands limited datasets by generating new samples
- **Improved Model Generalization**: Helps models learn more robust features
- **Class Imbalance Mitigation**: Balances the underrepresented classes by generating more samples
- **Reduced Overfitting**: Provides more diverse training data to prevent overfitting

### How to Use

1. Click the button and initiate the auto data augmentation process.

![Data augmentation button](_static/picture/dataset/auto_data_augmentation_button.jpg)


2. The augmentation process begins, and a progress indicator is displayed. If needed, you can cancel the process at any time.


![Data augmentation started](_static/picture/dataset/data_augmentation_started.jpg)

  > *Note: The augmentation process can take some time depending on the dataset size and complexity.*

3. Once the augmentation is complete, the results are displayed in a summary table showing the number of new samples generated.

![Data augmentation completed](_static/picture/dataset/data_augmentation_completed.jpg)

4. The newly generated augmented samples are automatically added to the dataset when you exit the augmentation dialog. The augmented data are available for model training.

![Augmented data](_static/picture/dataset/augmented_data.jpg)