link管理
链接快照平台
输入网页链接，自动生成快照
标签化管理网页链接
Search before asking

I have searched the YOLOv8 issues and discussions and found no similar questions.
Question

I tried to run yolo8 in GPU but it's not working.
I use torch to set the device to cuda but still not working on my GPU. The model I am using is PPE detection yolo8. I want to achieve fast reading and detection from the camera using rtsp and then sending the detection frame to rtmp server.
import cv2
from ultralytics import YOLO
import subprocess
import requests
import json
import random
import base64
from PIL import Image
import threading
import torch
torch.cuda.set_device(0)
Camera Stream

path = "rtsp://admin:[email protected]:554/Streaming/Channels/101/"
cap = cv2.VideoCapture(path)
Load the YOLOv8 model

model = YOLO('best.pt')
classes= {0: 'Hardhat', 1: 'Mask', 2: 'NO-Hardhat', 3: 'NO-Mask', 4: 'NO-Safety Vest', 5: 'Person', 6: 'Safety Cone', 7: 'Safety Vest', 8: 'machinery', 9: 'vehicle'}
Loop through the video frames

while cap.isOpened():
# Read a frame from the video
success, frame = cap.read()
if success:
    # Run YOLOv8 inference on the frame
    results = model(frame)
    annotated_frame = results[0].plot()
    # Saving the image
    cv2.imwrite("test1.jpeg", annotated_frame)
    # Encode the resized annotated frame to base64
    # Display the annotated frame
    cv2.imshow("YOLOv8 Inference", annotated_frame)
    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break
else:
    # Break the loop if the end of the video is reached
    break
def show_frame(frame):

cv2.imshow("YOLOv8 Inference", frame)
Release the video capture object and close the display window
cap.release()

cv2.destroyAllWindows()`
Additional
No response
          👋 Hello @MahaKhh, thank you for your interest in YOLOv8 🚀! We recommend a visit to the YOLOv8 Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.
If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.
Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.
Install
Pip install the ultralytics package including all requirements in a Python>=3.7 environment with PyTorch>=1.7.
pip install ultralytics
Environments
YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
Notebooks with free GPU:   
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide 
Status
If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.
          @MahaKhh, thank you for reaching out to us! To run YOLOv8 on a GPU, you can try the following:
Import the torch module and set the device to a GPU before loading the model:
import torch
torch.cuda.set_device(0) # Set to your desired GPU number
When creating the YOLO object, specify the device parameter as 'gpu':
model = YOLO('best.pt', device='gpu')
This should enable GPU acceleration for your detection on YOLOv8. Let me know if you have any further questions or issues!
  devjva, braiansmarzaro, jeanhadrien, atoav, abrleva8, parallelepipede, RafaelOO, KuuZeus, KshitijGupta11, and Shiruzui reacted with thumbs down emoji
  mcdonasd1212, TeteTorrents, DineshkumarS05, MEDI-cmd, yokahealthcare, taanjit, and codecodingg reacted with eyes emoji
    All reactions
          @shlyahin thank you for reaching out to us!
The device argument is not available in the constructor of the YOLOv8 class. The device parameter was introduced in the YOLOv5 implementation, but it is not supported in YOLOv8.
To run YOLOv8 on GPU, you need to ensure that your CUDA and CuDNN versions are compatible with your PyTorch installation, and PyTorch is properly configured to use CUDA. Additionally, you can set the GPU device using torch.cuda.set_device(0) before initializing the YOLOv8 model.
Please let me know if you have any further questions or concerns!
          I confirm with this code that YOLOv8 is not support M2 MPS GPU
"""Try train the YOLO from scratch."""
import torch
from ultralytics import YOLO
device: str = "mps" if torch.backends.mps.is_available() else "cpu"
# Load a model
model = YOLO("yolov8n.yaml")  # build a new model from scratch
model.to(device)
model.train(data="coco.yaml", epochs=5)
metrics = model.val()
      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        1/5         0G       3.65      5.755      4.303        243        640:   0%|          | 18/7393 [01:46<12:07:36,  5.92s/it]
@shlyahin thank you for reaching out to us!
The device argument is not available in the constructor of the YOLOv8 class. The device parameter was introduced in the YOLOv5 implementation, but it is not supported in YOLOv8.
To run YOLOv8 on GPU, you need to ensure that your CUDA and CuDNN versions are compatible with your PyTorch installation, and PyTorch is properly configured to use CUDA. Additionally, you can set the GPU device using torch.cuda.set_device(0) before initializing the YOLOv8 model.
Please let me know if you have any further questions or concerns!
I have yolov8 and nvidia gpu. How do I use yolov8 with nvidia?? I can not find any code or tutorial.

Please, give whole code.
          @azazelazaza to utilize your Nvidia GPU with YOLOv8, you'll first need to ensure that your PyTorch installation is compatible with your CUDA and CuDNN versions. These are necessary for exploiting the GPU acceleration capabilities.
If PyTorch is properly set up with CUDA, it should automatically utilize your GPU when using the YOLOv8 model.
To further ensure that a specific GPU is used during computations, you can indicate the GPU device number using the torch.cuda.set_device(device_number) function. Here device_number is the index number of your GPU, starting from 0.
You'll need to call this function before initializing your YOLOv8 model. That way, the model will be loaded onto the device specified.
Please ensure to handle any issues pertaining to CUDA memory or compatibility issues, should any arise during computations on the GPU. The specific handling would depend on the particular error witnessed.
          👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:




    

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐
          @Loukious if you have confirmed that your M2 MPS GPU is not recognized and you're getting only CPU usage despite following the previous instructions, it's possible that YOLOv8 does not currently support the MPS backend which is designed for Apple silicon (M1, M2 chips).
For Nvidia GPUs, using CUDA, typically, when a model is loaded in PyTorch and CUDA is available and properly installed, running the model should automatically set it to use the CUDA device without an explicit to(device) call. However, it can be a good practice to confirm device placement by explicitly setting the device like so:
import torch
from ultralytics import YOLO
# Check for CUDA device and set it
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using device: {device}')
# Load model
model = YOLO('path/to/your/model.pt').to(device)
# Now run inference or training
If torch.cuda.is_available() returns True, then the .to(device) call should place the model on the CUDA device, making it ready for GPU-accelerated operations.
Please ensure to follow the compatible PyTorch and CUDA installation instructions for your specific GPU and system configuration. If you experience further issues, it might be helpful to include the exact error messages or symptoms you're encountering to assist in troubleshooting the problem.
As always, please refer to the official Ultralytics documentation and resources provided for the most up-to-date guidance and support.
          @glenn-jocher torch detects my GPU just fine. I have an RTX4090 but the issue is it's only using so little of the GPU memory (0.8GB) and only around 10% of the GPU usage meanwhile my CPU is ramping up to around 80%. Is that normal?

My code https://pastebin.com/QXDp7vj1
          @dnhuan pretty sure my CUDA and PyTorch installations are fine as they are working fine with my GPU with other libraries.

This is after making the edit you suggested:

Note that I'm running Python on WSL2.
          Hi @Ambarish-Ombrulla,
If you've successfully loaded the model onto the GPU but are having trouble moving camera frames onto the GPU, here's a process you can follow to ensure frames from your camera are correctly transferred to the GPU for inference with YOLOv8:
First, capture the frame from your camera using OpenCV, as you normally would.
Convert the frame into the proper format and dimensionality expected by YOLOv8. This typically involves transforming the frame into a PyTorch tensor, normalizing the pixel values (typically to a range between 0 and 1), and possibly resizing the image to the input size expected by the model.
Before passing the frame to the model for inference, ensure that the tensor is on the same device as the model by using .to(device), where device is either 'cuda' or 'cuda:0' for single-GPU setups.
Perform the inference with the model.
It's not uncommon for the GPU memory usage to be lower than the maximum available, especially if the batch size or input image size is small. However, if your CPU usage is high, that could indicate that data loading or preprocessing steps may be the bottleneck, and they're being processed on the CPU.
You mentioned reshaping according to YOLOv8 input requirements. Ensure that the frame tensor shape is correct and batch dimension is included (even if it's a batch of one).
Please verify that you're performing these steps, and if the issue persists, it might be related to WSL2's specific way of handling GPU utilization. There have been various issues reported with GPU passthrough efficiency under WSL2, which could impact performance.
As a side note, if you are running an RTSP stream or heavy preprocessing, consider profiling which parts of your pipeline might be CPU-bound. Sometimes, using multiprocessing for handling the stream or preprocessing can reduce the computational load on the main thread, which may mitigate the high CPU usage you're experiencing.
Make sure your camera capture and preprocessing pipeline is well-optimized for performance to match your GPU's capabilities, as any bottlenecks here can limit the overall throughput of your system.
          @Ambarish-Ombrulla in YOLOv8, as with many computer vision models, input images typically need to conform to a certain size and shape that the network expects. If YOLOv8 expects a 640x640 input and you provide an image of different dimensions, you should resize or pad your images to match this requirement before inference.
This is not an issue with the model, but rather a standard requirement. Resizing keeps the aspect ratio of the original image intact, while padding adds pixels to the image to reach the desired size without changing the aspect ratio of the visible content.
Here's a simplified process for preparing your images:
Resize the image to the height or width that matches the model's expected input size while maintaining the original aspect ratio.
Pad the resized image to 640x640 with a neutral color (often black) on the sides that do not match the expected dimensions.
In YOLOv8, these preprocessing steps are generally built into the pipeline, so you don't typically need to perform them manually unless you're managing the images outside of the standard loading and inference routines provided by Ultralytics.
If you need to do this manually for some reason, you can use OpenCV or another image processing library in Python to resize and pad the image appropriately. Ensure the image tensor passed to the model is correctly formed with the shape (batch_size, channels, height, width), where height and width should be 640 for your case. The model should then be able to process the image on the GPU without issues related to the image shape.
Keep in mind that if you make these changes to the image before passing it to the model, you should also adjust any subsequent post-processing or bounding box calculations to account for the resizing and padding that you've applied.
          I used the method commented by @glenn-jocher as:
torch.cuda.set_device(0)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = YOLO('/your/model/path/', task='detect')

model.to(device=device)
and it worked
          Great to hear that the method worked for you, @Hasnain1997-ai! Setting the device for both your model and any incoming data tensors ensures that all computations happen on the GPU when available, which should provide you with the speed you're looking for in real-time applications.
Remember, whenever you're processing new data (like a frame from a video stream), you also need to move the data to the same device as the model by using .to(device) on the tensor. This way, both your model and data are on the GPU, allowing for efficient handling of the workload.
If you run into more questions or need further assistance as you continue to work with YOLOv8, feel free to consult the Ultralytics documentation or reach out to the community. Keep up the great work! 🚀
from ultralytics import YOLO

model = YOLO('yolov8n.yaml')

results = model.train(data='./config.yaml', epochs=3,device='cuda')
error:
               from  n    params  module                                       arguments                     
0                  -1  1 1                  -1  1 2                  -1  1 3                  -1  1 4                  -1  2 5                  -1  1 6 7 8 9 10                  -1  1 11             [-1, 6]  1 12 13                  -1  1 14             [-1, 4]  1 15 16 17            [-1, 12]  1 18 19 20             [-1, 9]  1 21 22        [15, 18, 21]  1 YOLOv8n summary:

464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
 4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
 7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]
 18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
 49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]
 73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
 -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]
 -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
 -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]
 -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 0  ultralytics.nn.modules.conv.Concat           [1]
 -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]
 0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 0  ultralytics.nn.modules.conv.Concat           [1]
 -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]
 -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]
 0  ultralytics.nn.modules.conv.Concat           [1]
 -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]
 -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
 0  ultralytics.nn.modules.conv.Concat           [1]
 -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]
 897664  ultralytics.nn.modules.head.Detect           [80, [64, 128, 256]]
 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
Ultralytics YOLOv8.0.225 🚀 Python-3.10.12 torch-2.1.1+cu121 CUDA:0 (NVIDIA GeForce RTX 4060, 7937MiB)

engine/trainer: task=detect, mode=train, model=yolov8n.yaml, data=./config.yaml, epochs=3, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=cuda, workers=8, project=None, name=train38, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train38

2023-12-08 15:13:30.316345: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered

2023-12-08 15:13:30.316589: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered

2023-12-08 15:13:30.453756: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Overriding model.yaml nc=80 with nc=3
               from  n    params  module                                       arguments                     
0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]

1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]

2                  -1  1      7360  ultralytics.nn.modules.block.C2f             [32, 32, 1, True]

3                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]

4                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]

5                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]

6                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]

7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]

8                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]

9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]

10                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']

11             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]

12                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]

13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']

14             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]

15                  -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]

16                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]

17            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]

18                  -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]

19                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]

20             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]

21                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]

TensorBoard: Start with 'tensorboard --logdir runs/detect/train38', view at http://localhost:6006/

Freezing layer 'model.22.dfl.conv.weight'

AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...

AMP: checks passed ✅

train: Scanning /home/khizar_smr/Khizar_data/KHIZAR FOLDER SMR/Computer_Vision_Work/Face_Recognition/data/labels/train.cache... 356 images, 210 backgrounds, 0 corrupt: 100%|██████████| 356/356 [00:00<?, ?it/s]

val: Scanning /home/khizar_smr/Khizar_data/KHIZAR FOLDER SMR/Computer_Vision_Work/Face_Recognition/data/labels/val.cache... 89 images, 58 backgrounds, 0 corrupt: 100%|██████████| 89/89 [00:00<?, ?it/s]

Plotting labels to runs/detect/train38/labels.jpg...

optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...

optimizer: AdamW(lr=0.001429, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)

Image sizes 640 train, 640 val

Using 8 dataloader workers

Logging results to runs/detect/train38

Starting training for 3 epochs...




    

  Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
0%|          | 0/23 [00:00<?, ?it/s]Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

Could not load library libcudnn_cnn_train.so.8. Error: /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERKNS1_20NormForwardOperationEmb, version libcudnn_cnn_infer.so.8

0%|          | 0/23 [00:02<?, ?it/s]
RuntimeError                              Traceback (most recent call last)

Cell In[2], line 11

5 model = YOLO('yolov8n.yaml')

7 # Correct the path to your data.yaml file

8 # data_path = r'/home/khizar_smr/Khizar_data/KHIZAR FOLDER SMR/Computer_Vision_Work/datasets/Dataset/Person_Labels/data/data.yaml'

10 # Train the model

---> 11 results = model.train(data='./config.yaml',

12                     #   imgsz=640,  # Image size

13                     #   batch=16,  # Batch size

14                       epochs=3,  # Number of epochs

15                       device='cuda')  # Use GPU if available

17 # Save the trained model
File ~/Khizar_data/KHIZAR FOLDER SMR/Computer_Vision_Work/cv_env/lib/python3.10/site-packages/ultralytics/engine/model.py:338, in Model.train(self, trainer, **kwargs)

336     self.model = self.trainer.model

337 self.trainer.hub_session = self.session  # attach optional HUB session

--> 338 self.trainer.train()

339 # Update model and cfg after training

340 if RANK in (-1, 0):
File ~/Khizar_data/KHIZAR FOLDER SMR/Computer_Vision_Work/cv_env/lib/python3.10/site-packages/ultralytics/engine/trainer.py:190, in BaseTrainer.train(self)

187         ddp_cleanup(self, str(file))

189 else:

257     allow_unreachable=True,

258     accumulate_grad=True,

259 )
RuntimeError: GET was unable to find an engine to execute this computation
          Hello @KhizarHashmi!
The error message you're encountering seems to suggest a problem with your CUDA/cuDNN installation, especially given the mention of an "undefined symbol" in libcudnn_cnn_train.so.8. This is a shared library for NVIDIA's cuDNN, suggesting that there could be a mismatch between the versions of CUDA, cuDNN, and the PyTorch you are using.
To address this issue, please:
Verify that you have the correct version of cuDNN installed for your version of CUDA.
Ensure that your PyTorch build is compatible with both your CUDA and cuDNN versions.
Review your system's environment variables to ensure paths for CUDA and cuDNN are properly set.
If all versions are compatible and the problem persists, consider reinstalling the CUDA/cuDNN libraries, as corrupted files or improper installations could also lead to such errors. If you need in-depth guidance, refer to the installation documents for CUDA and cuDNN provided by NVIDIA, as well as compatibility notes in the PyTorch documentation.
Acknowledging that you might already have verified some of these points, but due to the specificity of the error, it points strongly to something amiss in the installation stack of CUDA-related libraries.
          Thanks @glenn-jocher for your response, actually I have figured out some version mismatches in my configurations but I am confused about solving them, I need your guidance in that.

Below are my system configurations:

OS: Ubuntu 20.04

CPU: i5 10gen

GPU: RTX 4060 8GB

Nvidia Driver: 535.129.03

CUDA: 12.2

Cudnn: 8.9.4

Tensorflow: 2.15.0

PyTorch: 2.1.1+cu121

Python Version: 3.10.12
Well I guess, the latest version of PyTorch supports CUDA 12.1 so that may be causing the issue but I cannot find the Nvidia driver for my GPU that supports CUDA 12.1. Kindly guide me on how to proceed now.
          @KhizarHashmi given your configuration, it appears you have the correct PyTorch version for CUDA 12.1. However, the cuDNN version 8.9.4 might not be fully aligned with CUDA 12.2, which could cause errors. It's important to ensure all components of the software stack are compatible.
NVIDIA driver compatibility is typically broader than specific CUDA toolkit versions, so your current driver should support CUDA 12.1. You don't need to match the driver version with the CUDA version exactly; what's essential is that the driver supports the CUDA version you're using.
Regarding TensorFlow, as of my last update, TensorFlow does not officially support CUDA 12.1 yet. TensorFlow releases often lag behind the latest CUDA releases, so it's usually best to use the TensorFlow version with the CUDA and cuDNN versions it was built and tested with. You can check the official TensorFlow documentation for the exact compatibility matrix and version support.
To move forward, you should:
Verify compatibility between CUDA, cuDNN, and PyTorch versions using the official compatibility guides.
If necessary, downgrade/upgrade CUDA or cuDNN to match the versions required by the deep learning libraries you are using.
Keeping everything within the compatibility range should prevent the errors you're encountering and ensure your model trains successfully on your GPU.
          Yes, @KhizarHashmi, downgrading to CUDA 11.8 could be a good solution, as it is compatible with both PyTorch and TensorFlow. Your NVIDIA driver version 535.129.03 should support CUDA 11.8 without any issues. Make sure to install the version of cuDNN that matches CUDA 11.8 as well. After adjusting your environment, reinstall PyTorch and TensorFlow to ensure they are aligned with the CUDA and cuDNN versions. This should resolve your compatibility issues.
          @glenn-jocher So how do i train the model on GPU with the commands:

!yolo segment train data=data-detect.yaml model=yolov8m-seg.pt epochs=50 imgsz=640.

I have install all requirements but i get the error that command "yolo"  not found.
          @Sarai256 hello!
To train on a GPU using the command line, ensure you activate the correct Python environment where you've installed the prerequisites. For the error you're encountering, it seems the YOLOv8 package is not installed in your Python environment, or the installation path isn't in your system's PATH variable.
Try running the train command directly via the Python executable like this:
python yolov8/segment/train.py --data data-detect.yaml --weights yolov8m-seg.pt --epochs 50 --img-size 640
Make sure your terminal is opened in the root directory where yolov8 folder is located, or adjust the path to train.py accordingly.
          👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐
          Here is the link to the google colab notebook which will work with the free t4 gpu.
Google Colab: https://colab.research.google.com/drive/1SA7JLyWpDb8q2EM4hXlhOhTIVbUenlY-?usp=sharing
you cannoct view the live video , but the video will be saved in the .avi format.
          👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐
          @glenn-jocher Hi, I really need you your help. I'm trying to train the data in yolov8 with gpu by using this code,
yolo task=detect mode=train model=yolov8l.pt data=data.yaml epochs=300 batch=16 imgsz=640 device=0
And I'm getting the error,
Ultralytics YOLOv8.1.19 🚀 Python-3.10.11 torch-2.2.1+cpu

Traceback (most recent call last):

File "C:\Users\SAM\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main

return run_code(code, main_globals, None,

File "C:\Users\SAM\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code

exec(code, run_globals)

File "D:\Cairovision\data\Yolov8\human\Scripts\yolo.exe_main.py", line 7, in 

File "D:\Cairovision\data\Yolov8\human\lib\site-packages\ultralytics\cfg_init.py", line 568, in entrypoint

getattr(model, mode)(**overrides)  # default args from model

File "D:\Cairovision\data\Yolov8\human\lib\site-packages\ultralytics\engine\model.py", line 625, in train

self.trainer = (trainer or self._smart_load("trainer"))(overrides=args, _callbacks=self.callbacks)

File "D:\Cairovision\data\Yolov8\human\lib\site-packages\ultralytics\engine\trainer.py", line 100, in init

self.device = select_device(self.args.device, self.args.batch)

File "D:\Cairovision\data\Yolov8\human\lib\site-packages\ultralytics\utils\torch_utils.py", line 126, in select_device

raise ValueError(

ValueError: Invalid CUDA 'device=0' requested. Use 'device=cpu' or pass valid CUDA device(s) if available, i.e. 'device=0' or 'device=0,1,2,3' for Multi-GPU.
torch.cuda.is_available(): False

torch.cuda.device_count(): 0

os.environ['CUDA_VISIBLE_DEVICES']: None

See https://pytorch.org/get-started/locally/ for up-to-date torch install instructions if no CUDA devices are seen by torch.
Can you tell me, how can I resolve this?
          Hey @josmyk! It looks like PyTorch isn't detecting your GPU, which is why you're seeing the error. This usually happens when PyTorch isn't installed with CUDA support or if there's a mismatch between PyTorch and CUDA versions. 😅
First, let's check if PyTorch with CUDA support is installed correctly. Run this in your Python environment:
import torch
print(torch.__version__)
print(torch.cuda.is_available())
If torch.cuda.is_available() returns False, you'll need to reinstall PyTorch with the correct CUDA version. You can find the appropriate command on the PyTorch installation page by selecting your setup details.
For example, for CUDA 11.3:
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
Make sure to match the CUDA version with what's supported by your GPU and installed on your system. After reinstalling, try running your training command again. 🚀
Let me know how it goes!
          I was also getting similar error when trying to run inference on the gpu .

ERROR: "NotImplementedError: Could not run 'torchvision::nms' with arguments from CUDA backend" ...
According to the resources i found that was because: This error occurs when you have Torch and Torchaudio for CUDA but not Torchvision for CUDA installed.
I managed to resolve it by:

pip uninstall torch torchvision
Going to https://pytorch.org/ and copying pip re-installation command. (OS and cuda version specific )

In my case.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Refs:

Stack overflow
This issue also seems related:

#5059
hope this helps someone
          @dommyrock hey there! 👋 Thanks a lot for sharing your solution! 😊
You're spot-on; the error you encountered is indeed because Torchvision wasn't installed with CUDA support, even though Torch and Torchaudio were. Uninstalling and then installing the correct CUDA version for all three packages is a great approach.
For anyone else facing this issue, just follow the steps mentioned, ensuring you choose the command that matches your OS and CUDA version from the PyTorch official site.
Here's a quick recap:
Uninstall the current PyTorch, Torchvision, and Torchaudio:
pip uninstall torch torchvision torchaudio
Reinstall them with CUDA support (example for CUDA 11.8):
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
This should get things running smoothly on your GPU! 🚀
And thanks for the additional resource and issue link; it's always great to have more context. Happy coding!
          👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐