添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

I’m running Ubuntu 22.04 LTS with an RTX 4080 and wanted to upgrade my nvcc to the latest version, as the Ubuntu-provided one is only version 11.5.1 and did not support the compute level of my video card.

Using the easy to follow NVIDIA instructions for upgrading CUDA, I was able to upgrade nvcc but broke my video drivers in the process because the Ubuntu and NVIDIA-provided packages were conflicting. My entire screen went black during the installation of NVIDIA packages, and when I force rebooted the system I was met with this:

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 cuda-drivers-535 : Depends: nvidia-kernel-common-535 (>= 535.54.03) but it is not going to be installed
 nvidia-dkms-535 : Depends: nvidia-kernel-common-535 (= 535.54.03-0ubuntu1) but it is not going to be installed
 nvidia-driver-535 : Depends: nvidia-compute-utils-535 (= 535.54.03-0ubuntu1) but 535.54.03-0ubuntu0.22.04.1 is to be installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).
(base) vadi@barbar:~$ sudo apt --fix-broken install
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Correcting dependencies... Done
The following package was automatically installed and is no longer required:
  nvidia-firmware-535-535.54.03
Use 'sudo apt autoremove' to remove it.
The following additional packages will be installed:
  nvidia-compute-utils-535 nvidia-kernel-common-535
The following packages will be upgraded:
  nvidia-compute-utils-535 nvidia-kernel-common-535
2 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
3 not fully installed or removed.
Need to get 0 B/38,4 MB of archives.
After this operation, 61,6 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 file:/var/cuda-repo-ubuntu2204-12-2-local  nvidia-kernel-common-535 535.54.03-0ubuntu1 [38,2 MB]
Get:2 file:/var/cuda-repo-ubuntu2204-12-2-local  nvidia-compute-utils-535 535.54.03-0ubuntu1 [285 kB]
(Reading database ... 366284 files and directories currently installed.)
Preparing to unpack .../nvidia-kernel-common-535_535.54.03-0ubuntu1_amd64.deb ...
Unpacking nvidia-kernel-common-535 (535.54.03-0ubuntu1) over (535.54.03-0ubuntu0.22.04.1) ...
dpkg: error processing archive /var/cuda-repo-ubuntu2204-12-2-local/./nvidia-kernel-common-535_535.54.03
-0ubuntu1_amd64.deb (--unpack):
 trying to overwrite '/lib/firmware/nvidia/535.54.03/gsp_ga10x.bin', which is also in package nvidia-fir
mware-535-535.54.03 535.54.03-0ubuntu0.22.04.1
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Preparing to unpack .../nvidia-compute-utils-535_535.54.03-0ubuntu1_amd64.deb ...
Unpacking nvidia-compute-utils-535 (535.54.03-0ubuntu1) over (535.54.03-0ubuntu0.22.04.1) ...
dpkg: error processing archive /var/cuda-repo-ubuntu2204-12-2-local/./nvidia-compute-utils-535_535.54.03
-0ubuntu1_amd64.deb (--unpack):
 trying to overwrite '/usr/bin/nvidia-powerd', which is also in package nvidia-kernel-common-535 535.54.
03-0ubuntu0.22.04.1
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
nvidia-persistenced.service is a disabled or a static unit not running, not starting it.
nvidia-persistenced.service is a disabled or a static unit not running, not starting it.
Errors were encountered while processing:
 /var/cuda-repo-ubuntu2204-12-2-local/./nvidia-kernel-common-535_535.54.03-0ubuntu1_amd64.deb
 /var/cuda-repo-ubuntu2204-12-2-local/./nvidia-compute-utils-535_535.54.03-0ubuntu1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
(base) vadi@barbar:~$ 

For anyone else reading this, I fixed this with sudo dpkg --force-all -P nvidia-firmware-535-535.54.03 nvidia-kernel-common-535 nvidia-compute-utils-535 and then sudo apt --fix-broken install.

It would be great if the installation experience was better!

Thank you very much for the answer with your solution. The same has happened to me and only because I started the default Ubuntu software update process and suddenly the screen went black.
I don’t know what was happened. I had installed NVIDIA drivers with sudo apt install nvidia-driver-535, however sometimes they recommend using sudo ubuntu-drivers autoinstall. Maybe that was the problem. Anyway, with your answer I solved the error. Thank you.

Hey @Vadi your post should deserve much more attention!!!
I faced this problem as well, and pointlessly tried for a couple of time a clean install… but it didn’t work!

I was planning on asking on the Forum, but I first searched for the packages supposedly “broken” and found this thread. Thanks a lot your fix works wonderfully, it would be great to understand why this happens and if anything can be done from NVIDIA side to prevent it.

I tried this on Ubuntu on Windows and it’s not working:

(venv_p38) viorel@DESKTOP-DL6E5LD:/mnt/d/projects/llama/llama-recipes$ sudo dpkg --force-all -P nvidia-firmware-535-535.54.03 nvidia-kernel-common-535 nvidia-compute-utils-535
dpkg: warning: ignoring request to remove nvidia-firmware-535-535.54.03 which isn't installed
dpkg: warning: ignoring request to remove nvidia-kernel-common-535 which isn't installed
dpkg: warning: ignoring request to remove nvidia-compute-utils-535 which isn't installed
(venv_p38) viorel@DESKTOP-DL6E5LD:/mnt/d/projects/llama/llama-recipes$ sudo apt --fix-broken install
Reading package lists... Done
Building dependency tree
Reading state information... Done
0 upgraded, 0 newly installed, 0 to remove and 19 not upgraded.
(venv_p38) viorel@DESKTOP-DL6E5LD:/mnt/d/projects/llama/llama-recipes$ sudo apt install nvidia-driver-535
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
 nvidia-driver-535 : Depends: libnvidia-gl-535 (= 535.104.05-0ubuntu0.20.04.4) but 535.104.05-0ubuntu0.20.04.1 is to be installed
                     Depends: libnvidia-compute-535 (= 535.104.05-0ubuntu0.20.04.4) but 535.104.05-0ubuntu0.20.04.1 is to be installed
                     Depends: libnvidia-extra-535 (= 535.104.05-0ubuntu0.20.04.4) but 535.104.05-0ubuntu0.20.04.1 is to be installed
                     Depends: libnvidia-decode-535 (= 535.104.05-0ubuntu0.20.04.4) but 535.104.05-0ubuntu0.20.04.1 is to be installed
                     Depends: libnvidia-encode-535 (= 535.104.05-0ubuntu0.20.04.4) but 535.104.05-0ubuntu0.20.04.1 is to be installed
                     Depends: xserver-xorg-video-nvidia-535 (= 535.104.05-0ubuntu0.20.04.4) but 535.104.05-0ubuntu0.20.04.1 is to be installed
                     Depends: libnvidia-cfg1-535 (= 535.104.05-0ubuntu0.20.04.4) but 535.104.05-0ubuntu0.20.04.1 is to be installed
                     Depends: libnvidia-fbc1-535 (= 535.104.05-0ubuntu0.20.04.4) but 535.104.05-0ubuntu0.20.04.1 is to be installed
                     Recommends: libnvidia-compute-535:i386 (= 535.104.05-0ubuntu0.20.04.4) but it is not installable
                     Recommends: libnvidia-decode-535:i386 (= 535.104.05-0ubuntu0.20.04.4) but it is not installable
                     Recommends: libnvidia-encode-535:i386 (= 535.104.05-0ubuntu0.20.04.4) but it is not installable
                     Recommends: libnvidia-fbc1-535:i386 (= 535.104.05-0ubuntu0.20.04.4) but it is not installable
                     Recommends: libnvidia-gl-535:i386 (= 535.104.05-0ubuntu0.20.04.4) but it is not installable
E: Unable to correct problems, you have held broken packages.
              

My problem is, after times of installation and re-installation, the driver works right after re-installation but will surely broke later again and again. I dunno what causes this issue… I only did training with cuda torch and stopped training.

Issue is:

/root/miniconda3/envs/vallex/lib/python3.10/site-packages/torch/cuda/__init__.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at /opt/conda/conda-bld/pytorch_1678402411778/work/c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0

While re-installing:

Building dependency tree... Done
Reading state information... Done
cuda is already the newest version (12.2.2-1).
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 libnvidia-gl-535 : Breaks: libnvidia-gl-535:i386 (!= 535.104.12-0ubuntu1) but 535.113.01-0ubuntu0.22.04.1 is to be installed
 libnvidia-gl-535:i386 : Breaks: libnvidia-gl-535 (!= 535.113.01-0ubuntu0.22.04.1) but 535.104.12-0ubuntu1 is to be installed
 nvidia-dkms-535 : Depends: nvidia-kernel-common-535 (>= 535.113.01) but 535.104.12-0ubuntu1 is to be installed
 nvidia-driver-535 : Depends: libnvidia-gl-535 (= 535.113.01-0ubuntu0.22.04.1) but 535.104.12-0ubuntu1 is to be installed
                     Depends: nvidia-kernel-common-535 (>= 535.113.01) but 535.104.12-0ubuntu1 is to be installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).
 Vadi:

udo dpkg --force-all -P nvidia-firmware-535-535.54.03 nvidia-kernel-common-535 nvidia-compute-utils-535

what I did before re-installation:

sudo apt-get --purge remove nvidia-*
sudo apt-get --purge remove libnvidia-*
udo dpkg --force-all -P nvidia-firmware-535-535.54.03 nvidia-kernel-common-535 nvidia-compute-utils-535 libnvidia-decode-535 nvidia-driver-535
udo dpkg --force-all -P nvidia-*
udo dpkg --force-all -P libnvidia-*
sudo apt autoremove
sudo apt autoclean
              

Facing Same Issues with NVIDIA QUADRO GV100 GPU - Request for Support

I hope this message finds you well. I am writing to seek assistance with an issue I am encountering with my NVIDIA QUADRO GV100 GPU.

Upon attempting to utilize the GPU, I encountered the following error message when running sudo nvidia-smi:

Error:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Despite attempting various troubleshooting steps, including driver reinstallation and system reboots, the issue persists. As this GPU is crucial for my work, I would greatly appreciate your assistance in resolving this matter promptly.

Below are the relevant details of my system:

  • GPU Model: NVIDIA QUADRO GV100
  • Operating System: Ubuntu 22.04
  • Driver Version: Follow this steps given in this site: https://www.cyberciti.biz/faq/ubuntu-linux-install-nvidia-driver-latest-proprietary-driver/
  • Could you please provide guidance on how to resolve this issue or any further steps I should take? If necessary, I am available for remote assistance to facilitate the troubleshooting process.

    Thank you for your attention to this matter. I look forward to your prompt response and resolution of the issue.