I meet the following error when inference for several hours on K8S, and even initial the model again in exept clause cannot help:
2023-07-13 01:27:18.636 | ERROR | yolov8_detection:inference:85 - CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Additional
No response
@darouwan thank you for reaching out with your issue. The CUDA error you're experiencing is typically caused by accessing memory that is already freed or is out of bounds. This could be due to several reasons, such as an issue with a CUDA kernel, insufficient GPU memory, or a problem with tensor sizes during inference.
In your case, it's possible that continuously running inference for several hours is causing an issue with GPU memory management. It might be helpful to monitor the GPU memory usage over time during inference to identify any potential memory leaks.
As suggested in the error message, you could also try launching the CUDA kernels with the environment variable CUDA_LAUNCH_BLOCKING=1. Although this might slow down the computation, it enables synchronous execution and can help identifying the operation that's causing the problem. Additionally, you could try compiling with TORCH_USE_CUDA_DSA
to enable device-side assertions as this might provide a clearer idea of what's causing the issue.
Lastly, to help us better assist you, could you provide more details of your setup, such as the exact command you're using to run the inference, the version of YOLOv8 you're using, your GPU specifications, and the version of your CUDA and PyTorch?
Looking forward to resolving this issue for you soon.
@glenn-jocher Thanks. I have set CUDA_LAUNCH_BLOCKING=1 in env variable and continue to monitor it.
Is there any best practice for model serving of yolov8?
Hello @darouwan,
It's great to hear that you're taking the CUDA_LAUNCH_BLOCKING=1 approach for error tracking. This setup should provide a clearer understanding of where the issue might lie.
As for YOLOv8 model serving, a common practice is to establish a REST API using a framework like Flask or FastAPI which will accept image data, perform inference using the model, and then return the results.
When preparing your model for serving, it is crucial to conduct proper warm-up and cool-down procedures for your CUDA context. This will help to manage GPU resources effectively and avoid memory errors over long periods of inference.
Do consider embedding checks for GPU memory usage within your model-serving pipeline, as this can alert you to any memory leaks that might be creeping in over time.
Also, utilizing efficient batching strategies can help optimize inference time and resource allocation. The choice of batch size might vary depending on the GPU memory. Bigger batches are often faster on a per-image basis, but they require more memory.
Remember, each deployment environment will have different constraints and requirements, so the strategies might vary from case to case. Monitoring your environment and tweaking accordingly will help ensure you deliver a robust solution.
If you encounter any other issues or need further clarification, don't hesitate to let us know. We're here to help.
I have set the CUDA_LAUNCH_BLOCKING=1, but no detailed message can be given in error message. I am using FastAPI to provide model restful service, and use nvidia-smi to monitor the GPU usage, it looks everything is well. Whatever I use batch size = 1 or 10, this error continues happen.
The yolov8 version is from FROM ultralytics/ultralytics:latest@sha256:e10ebfa99c36ade9aa2862f818339a5f86f6c0a93ae94cf909f380d0af492fd8
ultralytics version is 8.0.141, pytorch version is 2.0.1, CUDA version is 12.0, GPU is Tesla P4
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/src/ultralytics/ultralytics/engine/model.py", line 254, in predict
return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream)
File "/usr/src/ultralytics/ultralytics/engine/predictor.py", line 195, in __call__
return list(self.stream_inference(source, model, *args, **kwargs)) # merge list of Result into one
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/usr/src/ultralytics/ultralytics/engine/predictor.py", line 246, in stream_inference
with profilers[0]:
File "/usr/src/ultralytics/ultralytics/utils/ops.py", line 39, in __enter__
self.start = self.time()
File "/usr/src/ultralytics/ultralytics/utils/ops.py", line 54, in time
torch.cuda.synchronize()
File "/opt/conda/lib/python3.10/site-packages/torch/cuda/__init__.py", line 688, in synchronize
return torch._C._cuda_synchronize()
RuntimeError: CUDA error: misaligned address
Hello @darouwan,
Thank you for the detailed report and providing your setup's detailed environment. Your issue appears to be related to CUDA memory alignment, often caused by the GPU trying to access a memory region that was not assigned or is out of its accessible boundary.
The "misaligned address" error may imply that PyTorch is trying to load or store data types whose natural alignment needs are not met. For example, CUDA requires that 2-byte half precision, 4-byte float or int32, 8-byte double or int64 data should be aligned to their size in memory. Any misalignment could result in hardware exceptions.
While it's difficult to pinpoint the exact cause for this without further information, a few common scenarios could lead to this problem. For example, running PyTorch versions not fully compatible with the installed CUDA version and outdated NVIDIA drivers.
From the stack trace you provided, it seems that the error is triggered at the point where torch.cuda.synchronize()
is called. This function explicitly waits for all kernels in all streams on a CUDA device to complete. It could be that the synchronization call is made while a memory region is being accessed or modified, which could potentially cause a misaligned address error.
As part of the troubleshooting process, could you please check whether your setup follows the guidelines for compatibility between CUDA, PyTorch, and your NVIDIA Driver? For instance, PyTorch 2.0.1 could be incompatible with CUDA 12.0. You might want to consider using a pyTorch version that's explicitly compatible with CUDA 12.0.
Regarding your model serving setup with FastAPI, it's good to hear that GPU usage seems to be under control. In addition to setting CUDA_LAUNCH_BLOCKING=1
, you might want to consider setting the TORCH_CUDA_ALLOC_SYNC
environment variable to 1
. This will force PyTorch to synchronize on every CUDA memory allocation, which helps with debugging memory errors.
For continued tracking, please make sure to keep us updated on your situation as you continue your troubleshooting. We'll be glad to provide any assistance we can to get your model running smoothly.
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐
I have encountered the same issue as well.What is the final and best solution? Thank you.
Internal Server Error: /api/predict/ Traceback (most recent call last): File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/django/core/handlers/exception.py", line 47, in inner response = get_response(request) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/django/core/handlers/base.py", line 181, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view return view_func(*args, **kwargs) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/django/views/generic/base.py", line 70, in view return self.dispatch(request, *args, **kwargs) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/rest_framework/views.py", line 509, in dispatch response = self.handle_exception(exc) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/rest_framework/views.py", line 469, in handle_exception self.raise_uncaught_exception(exc) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception raise exc File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/rest_framework/views.py", line 506, in dispatch response = handler(request, *args, **kwargs) File "/home/ceshi001/PycharmProjects/server/yolov8/./yolo/views.py", line 186, in post results = modelobj(source=img, conf=confidence, device=device_num, ) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/ultralytics/yolo/engine/model.py", line 111, in __call__ return self.predict(source, stream, **kwargs) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/ultralytics/yolo/engine/model.py", line 250, in predict self.predictor.setup_model(model=self.model, verbose=is_cli) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/ultralytics/yolo/engine/predictor.py", line 295, in setup_model self.model = AutoBackend(model, File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/ultralytics/nn/autobackend.py", line 93, in __init__ model = weights.to(device) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to return self._apply(convert) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/ultralytics/nn/tasks.py", line 156, in _apply self = super()._apply(fn) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply param_applied = fn(param) File "/home/ceshi001/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: CUDA error: misaligned address CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA to enable device-side assertions.
@Mrgorlay hello,
Thanks for reaching out and providing such detailed information about your issue. The error message "CUDA error: misaligned address" is typically an indicator that your PyTorch version is not fully compatible with the CUDA version you have installed.
Can you please confirm whether your setup aligns with the compatibility guidelines between PyTorch, CUDA, and the NVIDIA driver? The misalignment issue can sometimes result from using incompatible versions of these software.
From the stack trace, it appears the error is raised when the model.to(device)
operation is applied. This operation moves the model to the GPU device for accelerated computation. If there's an issue with your GPU setup (e.g., CUDA version, PyTorch compatibility), the error could be flagged here.
Furthermore, please make sure your GPU has enough memory to hold the model and perform the operations. Frequently, similar issues relate back to trying to allocate more memory than what is available on the device.
You can debug the CUDA kernel error by setting CUDA_LAUNCH_BLOCKING=1
. This environment variable makes CUDA operations run synchronously, which helps in identifying which specific operation is causing the error.
If the issue persists, please provide us with additional details or updates on your situation so we can help you solve the problem more effectively.
@Mrgorlay hello,
I'm glad to hear that updating the ultralytics package helped resolve your issue temporarily. This could be attributed to the updated compatibility with your specific system requirements in the newer version.
Moving forward, ensure you regularly update your packages to keep your environment stable and prevent further unexpected errors. If you do notice the issue reappearing or encounter a different problem, feel free to report it. We are here to help.
Thank you for your contributions to this project and the community. Your feedback plays a vital role in helping us improve YOLOv8!
Best regards.