Enabling NVIDIA® GPUs for accelerating deep learning model training
Prerequisites
Before using NVIDIA® GPUs, ensure that you meet the following prerequisites:
System requirements
Ubuntu 22.04 LTS
Hardware requirements
NVIDIA® GPUs that meet the specifications outlined by TensorFlow in this link.
Step-by-step instructions
Option 1 (Recommended)
Step 1. Install NVIDIA® GPU drivers
Install the NVIDIA® GPU drivers with a minimum version of 525.60.13.
For example, you can download and install the driver of 535.274.02 with the following commands.
wget https://download.nvidia.com/XFree86/Linux-x86_64/535.274.02/NVIDIA-Linux-x86_64-535.274.02.run sudo sh NVIDIA-Linux-x86_64-535.274.02.run
Next, verify the installation using:
nvidia-smi
Step 2. Install CUDA Toolkit 12.5
Time Series Studio requires the CUDA Toolkit 12.5. You can install and config the CUDA Toolkit as follows.
wget https://developer.download.nvidia.com/compute/cuda/12.5.0/local_installers/cuda_12.5.0_555.42.02_linux.run sudo sh cuda_12.5.0_555.42.02_linux.run
Next, you need to add the following lines to your
~/.bashrcfile.export PATH=/usr/local/cuda-12.5/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-12.5/lib64:$LD_LIBRARY_PATH
After you have added them to the
~/.bashrcfile, executesource ~/.bashrcandsudo ldconfig.Step 3. Install cuDNN 9.3
Time Series Studio also requires cuDNN 9.3. You can install the cuDNN as follows.
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update sudo apt-get -y install cudnn sudo apt-get -y install cudnn-cuda-12
After installing the CUDA Toolkit and cuDNN, you can leverage the GPU to train integrated deep learning models in eIQ Time Series Studio. Please note that if either component is installed incorrectly, for example, if the version is incompatible, the GPU will not function properly during training.
Option 2
Step 1. Install NVIDIA® GPU drivers
Install the NVIDIA® GPU drivers with a minimum version of 525.60.13.
For example, you can download and install the driver of 535.274.02 with the following commands.
wget https://download.nvidia.com/XFree86/Linux-x86_64/535.274.02/NVIDIA-Linux-x86_64-535.274.02.run sudo sh NVIDIA-Linux-x86_64-535.274.02.run
Next, verify the installation using:
nvidia-smi
Step 2. Install Python 3.10.8
If you already have Python 3.10.8 installed, you can skip this step.
If Python 3.10.8 is not installed, use a Python version management tool such as pyenv.
Example:
pyenv install 3.10.8 # set Python 3.10.8 globally pyenv global 3.10.8
Step 3. Create a virtual environment with venv
Navigate to your preferred directory and create a virtual environment named
tssusing Python 3.10.8:python -m venv tss
Activate the environment:
source tss/bin/activate
Step 4. Install TensorFlow
Upgrade pip to the latest version:
pip install --upgrade pip
Then, install TensorFlow with CUDA support:
pip install tensorflow[and-cuda]==2.18.1
Step 5. Configure environment variables
After installing TensorFlow, locate the
nvidiafolder under your virtual environment (e.g.,lib/python3.10/site-packages/nvidia).Add the following lines to your
~/.bashrcfile, replacing/your_path/with the actual path:export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$(find /your_path/tss/lib/python3.10/site-packages/nvidia -type f -name "*.so*" -exec dirname {} \; | sort -u | paste -sd: -)" export XLA_FLAGS="--xla_gpu_cuda_data_dir=/your_path/tss/lib/python3.10/site-packages/nvidia/cuda_nvcc"
Reload the configuration:
source ~/.bashrc