可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
When I run nvidia-smi
I get the following message:
Failed to initialize NVML: Driver/library version mismatch
An hour ago I received the same message and uninstalled my cuda library and I was able to run nvidia-smi
, getting the following result:
After this I downloaded cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
from the official NVIDIA page and then simply:
sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
Now I have cuda installed, but I get the mentioned mismatch error.
Some potentially useful information:
Running cat /proc/driver/nvidia/version
I get:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 378.13 Tue Feb 7 20:10:06 PST 2017
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
I'm running Ubuntu 16.04.2 LTS.
Kernel release is: 4.4.0-66-generic.
Thanks!
回答1:
Surprise surprise, rebooting solved the issue (I thought I had already tried that).
The solution Robert Crovella mentioned in the comments may also be useful to someone else, since it's pretty similar to what I did to solve the issue the first time I had it.
回答2:
As @etal said, rebooting can solve this problem, but I think a procedure without rebooting will help.
For Chinese, check my blog -> 中文版
The error message
NVML: Driver/library version mismatch
tell us the Nvidia driver kernel module (kmod) have a wrong version, so we should unload this driver, and then load the correct version of kmod
How to do that ?
First, we should know which drivers are loaded.
lsmod | grep nvidia
you may get
nvidia_uvm 634880 8
nvidia_drm 53248 0
nvidia_modeset 790528 1 nvidia_drm
nvidia 12312576 86 nvidia_modeset,nvidia_uvm
our final goal is to unload nvidia
mod, so we should unload the module depend on nvidia
sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia_uvm
then, unload nvidia
sudo rmmod nvidia
Troubleshooting
if you get an error like rmmod: ERROR: Module nvidia is in use
, which indicates that the kernel module is in use, you should kill the process that using the kmod:
sudo lsof /dev/nvidia*
and then kill those process, then continue to unload the kmods
Test
confirm you successfully unload those kmods
lsmod | grep nvidia
you should get nothing, then confirm you can load the correct driver
nvidia-smi
you should get the correct output
回答3:
So I was having this problem, none of the other remedies worked. The error message was opaque, but checking dmesg was key:
[ 10.118255] NVRM: API mismatch: the client has the version 410.79, but
NVRM: this kernel module has the version 384.130. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
However I had completely removed the 384 version, and removed any remaining kernel drivers nvidia-384*
. But even after reboot, I was still getting this. Seeing this meant that the kernel was still compiled to reference 384, but was only finding 410. So I recompiled my kernel:
# uname -a # find the kernel it's using
Linux blah 4.13.0-43-generic #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
# update-initramfs -c -k 4.13.0-43-generic #recompile it
# reboot
And then it worked.
After removing 384, I still had 384 files in:
/var/lib/dkms/nvidia-XXX/XXX.YY/4.13.0-43-generic/x86_64/module
/lib/modules/4.13.0-43-generic/kernel/drivers
I recommend using the locate
command (not installed by default) rather than searching the filesystem every time.
回答4:
The top-2 answers can't solve my problem. I found a solution at the Nvidia official forum solved my problem.
The below error info may cause by installing two different versions of the driver by different approaches. For example, install Nvidia driver by the apt and the official installer.
Failed to initialize NVML: Driver/library version mismatch
To solve this problem, only need to execute one of the following two commands.
sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall
回答5:
I got the error failed to initialize NVML: Driver/Library version mismatch
from my nvidia-gpu-temperature-indicator. And nvidia-smi failed to print any info. I tried to find if there were other versions of nvidia driver installed in my ubuntu. But I just found nvidia-driver-390. In the end, reboot
helped me solve the problem.
回答6:
This also happened to me on Ubuntu 16.04 using the nvidia-348
package (latest nvidia version on Ubuntu 16.04).
However I could resolve the problem by installing nvidia-390
through the Proprietary GPU Drivers PPA.
So a solution to the described problem on Ubuntu 16.04 is doing this:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-390
Note: This guide assumes a clean Ubuntu install. If you have previous drivers installed a reboot migh be needed to reload all the kernel modules.
回答7:
Had the issue too. (I'm running ubuntu 18.04)
What I did:
dpkg -l | grep -i nvidia
Then
sudo apt-get remove --purge nvidia-381
(and every duplicate version, in my case I had 381, 384 and 387)
Then sudo ubuntu-drivers devices
to list what's available
And I choose sudo apt install nvidia-driver-430
After that, nvidia-smi
gave the correct output (no need to reboot). But I suppose you can reboot when in doubt.
I also followed this installation to reinstall cuda+cudnn.
回答8:
I experienced this problem after a normal kernel update on a CentOS machine. Since all CUDA and nVidia drivers and libraries have been installed via YUM repositories, I managed to solve the issues using the following steps:
sudo yum remove nvidia-driver-*
sudo reboot
sudo yum install nvidia-driver-cuda nvidia-modprobe
sudo modprobe nvidia # or just reboot
It made sure my kernel and my nVidia driver are consistent. I reckon that just rebooting may result in wrong version of kernel module being loaded.
回答9:
I had reinstalled nvidia driver: run these commands in root
mode:
systemctl isolate multi-user.target
modprobe -r nvidia-drm
Reinstall Nvidia driver: chmod +x NVIDIA-Linux-x86_64–410.57.run
systemctl start graphical.target
and finally check nvidia-smi
Thanks to:
How To Install Nvidia Drivers and CUDA-10.0 for RTX 2080 Ti GPU on Ubuntu-16.04/18.04
How to unload kernel module 'nvidia-drm'?
回答10:
I committed the container into a docker image. Then I recreate another container using this docker image and the problem was gone.
回答11:
These answers not worked for me:
https://stackoverflow.com/a/43023000/1179925
https://stackoverflow.com/a/45319156/1179925
https://stackoverflow.com/a/54349675/1179925
dmesg
NVRM: API mismatch: the client has the version 418.67, but
NVRM: this kernel module has the version 430.26. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
Uninstall old driver 418.67
and install new driver 430.26
(download NVIDIA-Linux-x86_64-430.26.run
):
sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall
chmod +x NVIDIA-Linux-x86_64-430.26.run
sudo ./NVIDIA-Linux-x86_64-430.26.run
[ignore abort]
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 430.26 Tue Jun 4 17:40:52 CDT 2019
GCC version: gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
回答12:
reboot.
If the problem still exist:
sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia
nvidia-smi
for cent/rhel
cd /boot
mv initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut -vf initramfs-$(uname -r).img $(uname -r)
then
reboot
回答13:
For my case, I have installed nvidia driver and then cuda. I found it can be fixed by just install cuda. https://developer.nvidia.com/cuda-toolkit