Enabling NVIDIA® GPUs for accelerating deep learning model training

Prerequisites

Before using NVIDIA® GPUs, ensure that you meet the following prerequisites:

  • System requirements

    • Ubuntu 22.04 LTS

  • Hardware requirements

    • NVIDIA® GPUs that meet the specifications outlined by TensorFlow in this link.

Step-by-step instructions

Option 2

  • Step 1. Install NVIDIA® GPU drivers

    Install the NVIDIA® GPU drivers with a minimum version of 525.60.13.

    For example, you can download and install the driver of 535.274.02 with the following commands.

    wget https://download.nvidia.com/XFree86/Linux-x86_64/535.274.02/NVIDIA-Linux-x86_64-535.274.02.run
    
    sudo sh NVIDIA-Linux-x86_64-535.274.02.run
    

    Next, verify the installation using:

    nvidia-smi
    
  • Step 2. Install Python 3.10.8

    If you already have Python 3.10.8 installed, you can skip this step.

    If Python 3.10.8 is not installed, use a Python version management tool such as pyenv.

    Example:

    pyenv install 3.10.8
    # set Python 3.10.8 globally
    pyenv global 3.10.8
    
  • Step 3. Create a virtual environment with venv

    Navigate to your preferred directory and create a virtual environment named tss using Python 3.10.8:

    python -m venv tss
    

    Activate the environment:

    source tss/bin/activate
    
  • Step 4. Install TensorFlow

    Upgrade pip to the latest version:

    pip install --upgrade pip
    

    Then, install TensorFlow with CUDA support:

    pip install tensorflow[and-cuda]==2.18.1
    
  • Step 5. Configure environment variables

    After installing TensorFlow, locate the nvidia folder under your virtual environment (e.g., lib/python3.10/site-packages/nvidia).

    Add the following lines to your ~/.bashrc file, replacing /your_path/ with the actual path:

    export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$(find /your_path/tss/lib/python3.10/site-packages/nvidia -type f -name "*.so*" -exec dirname {} \; | sort -u | paste -sd: -)"
    
    export XLA_FLAGS="--xla_gpu_cuda_data_dir=/your_path/tss/lib/python3.10/site-packages/nvidia/cuda_nvcc"
    

    Reload the configuration:

    source ~/.bashrc