睿智的青椒 · 对剩菜say ...· 3 月前 · |
小眼睛的火龙果 · 在线java/php/js/aspx/asp ...· 3 月前 · |
绅士的咖啡 · 汇编、反汇编书籍-城通网盘资源搜索引擎-微盘· 5 月前 · |
捣蛋的手术刀 · @typescript-eslint/exp ...· 7 月前 · |
微笑的香烟
7 月前 |
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your accountNo response
I created a python venv in which I installed TF 2.16.1 following your instructions: pip install tensorflow
When I run python, import tf, and issue tf.config.list_physical_devices('GPU')
I get an empty list [ ]
I created another python venv, installed TF 2.16.1, only this time with the instructions:
python3 -m pip install tensorflow[and-cuda]
When I run that version, import tensorflow as tf, and issue
tf.config.list_physical_devices('GPU')
I also get an empty list.
BTW, I have no problems running on my box TF 2.15.1 with GPUs. Julia also works just fine with GPUs and so does PyTorch.
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2024-03-09 19:15:45.018171: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-09 19:15:50.412646: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>> tf.__version__
'2.16.1'
tf.config.list_physical_devices('GPU')
2024-03-09 19:16:28.923792: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-03-09 19:16:29.078379: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Relevant log output
No response
sh-shahrokhi, DiegoMont, slevang, sgkouzias, ChristofKaufmann, amoschoomy, yutkin, LinzhouLi, niko247, caizhuodi, and 28 more reacted with thumbs up emoji
jaclu010 and phillipus85 reacted with eyes emoji
All reactions
I have the same problem with Ubuntu 22.04.4 with the following environment:
tensorflow==2.16.1
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] on linux
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
nvcc --version
output:
nvcc: NVIDIA (R) Cuda compiler driver
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
I'm not sure if this is the root cause, but I resolved my own issue which also surfaced as a "Cannot dlopen some GPU libraries." error when trying to run python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
To resolve my issue, I followed the tested build versions here:
https://www.tensorflow.org/install/source#gpu
and I needed to update my existing installations from cuDNN 9 -> 8.9 and CUDA 12.4->12.3
When you're on an NVIDIA download page like this one for CUDA Toolkit, don't just download the latest version. See previous versions by hitting "Archive of Previous CUDA Releases"
@JuanVargas can you try uninstalling your existing CUDA installation to a tested build configuration for TF 2.16 by downgrading to CUDA 12.3?
I followed this post to uninstall my existing cuda installation:
https://askubuntu.com/questions/530043/removing-nvidia-cuda-toolkit-and-installing-new-one
@DiegoMont can you try upgrading your cuDNN to 8.9 and CUDA to 12.3?
I am having the same issue. Brand new Ubuntu 22.04 WSL2 image. Blank Conda environment with either python 3.12.*
or 3.11.*
fails to correctly setup tensorflow for GPU use when following the recommended:
pip install tensorflow[and-cuda]
Trying to list the physical devices results in:
2024-03-11 02:00:00.294704: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 02:00:00.709325: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-11 02:00:01.180225: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2d:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 02:00:01.180445: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
cuDNN 8.9.*
Cuda 12.3
Tensorflow 2.16.1
TensorRT 8.6.1
Is this a new issue caused by the fact that it doesn't appear that any system cuda needs to be separately installed in WSL2 anymore. I certainly didn't install one manually and yet nvidia-smi
is happily reporting cuda version 12.3. It probably comes down to some env paths not set correctly but playing around with $CUDA_PATH and guessing the location within the conda environment has not resolved anything. TensorRT doesn't seem to be picked up yet is definitely installed in the conda environment. Pytorch GPU visibility works as expected.
Hi @JuanVargas ,
For GPU package you need to ensure the installation of CUDA driver which can be verified with nvidia-smi command. Then you need to install TF-cuda package with pip install tensorflow[and-cuda]
which automatically installs required cuda/cudnn libraries.
I have checked in colab and able to detect GPU.Please refer attached gist.
3ff3x0r, danielgusvt, christofferae, mareq, juliacremus, and TechnicolorGUO reacted with thumbs down emoji
sgkouzias, whatiskeptiname, VukIG, DineshNeupane, and mareq reacted with confused emoji
All reactions
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: tensorflow==2.16.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (2.16.1)
Requirement already satisfied: absl-py>=1.0.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.1.0)
Requirement already satisfied: astunparse>=1.6.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.6.3)
Requirement already satisfied: flatbuffers>=23.5.26 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (24.3.7)
Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.5.4)
Requirement already satisfied: google-pasta>=0.1.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.2.0)
Requirement already satisfied: h5py>=3.10.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.10.0)
Requirement already satisfied: libclang>=13.0.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (16.0.6)
Requirement already satisfied: ml-dtypes~=0.3.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.3.2)
Requirement already satisfied: opt-einsum>=2.3.2 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.3.0)
Requirement already satisfied: packaging in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (24.0)
Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (4.25.3)
Requirement already satisfied: requests<3,>=2.21.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.31.0)
Requirement already satisfied: setuptools in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (69.1.1)
Requirement already satisfied: six>=1.12.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.16.0)
Requirement already satisfied: termcolor>=1.1.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.4.0)
Requirement already satisfied: typing-extensions>=3.6.6 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (4.10.0)
Requirement already satisfied: wrapt>=1.11.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.16.0)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.62.1)
Requirement already satisfied: tensorboard<2.17,>=2.16 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.16.2)
Requirement already satisfied: keras>=3.0.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.0.5)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.36.0)
Requirement already satisfied: numpy<2.0.0,>=1.23.5 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (1.26.4)
Requirement already satisfied: nvidia-cublas-cu12==12.3.4.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.4.1)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.3.101 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.101)
Requirement already satisfied: nvidia-cuda-nvcc-cu12==12.3.107 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.107)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.3.107 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.107)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.3.101 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.101)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.7.29 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (8.9.7.29)
Requirement already satisfied: nvidia-cufft-cu12==11.0.12.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (11.0.12.1)
Requirement already satisfied: nvidia-curand-cu12==10.3.4.107 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (10.3.4.107)
Requirement already satisfied: nvidia-cusolver-cu12==11.5.4.101 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (11.5.4.101)
Requirement already satisfied: nvidia-cusparse-cu12==12.2.0.103 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.2.0.103)
Requirement already satisfied: nvidia-nccl-cu12==2.19.3 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (2.19.3)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.3.101 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorflow[and-cuda]==2.16.1) (12.3.101)
Requirement already satisfied: wheel<1.0,>=0.23.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from astunparse>=1.6.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.42.0)
Requirement already satisfied: rich in ./miniconda3/envs/tf/lib/python3.11/site-packages (from keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (13.7.1)
Requirement already satisfied: namex in ./miniconda3/envs/tf/lib/python3.11/site-packages (from keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.0.7)
Requirement already satisfied: dm-tree in ./miniconda3/envs/tf/lib/python3.11/site-packages (from keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.1.8)
Requirement already satisfied: charset-normalizer<4,>=2 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from requests<3,>=2.21.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from requests<3,>=2.21.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from requests<3,>=2.21.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from requests<3,>=2.21.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2024.2.2)
Requirement already satisfied: markdown>=2.6.8 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorboard<2.17,>=2.16->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.5.2)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorboard<2.17,>=2.16->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from tensorboard<2.17,>=2.16->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.0.1)
Requirement already satisfied: MarkupSafe>=2.1.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from werkzeug>=1.0.1->tensorboard<2.17,>=2.16->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.1.5)
Requirement already satisfied: markdown-it-py>=2.2.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from rich->keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from rich->keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (2.17.2)
Requirement already satisfied: mdurl~=0.1 in ./miniconda3/envs/tf/lib/python3.11/site-packages (from markdown-it-py>=2.2.0->rich->keras>=3.0.0->tensorflow==2.16.1->tensorflow[and-cuda]==2.16.1) (0.1.2)
nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01 Driver Version: 551.76 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 Ti On | 00000000:01:00.0 On | N/A |
| 0% 39C P5 10W / 285W | 4334MiB / 12282MiB | 13% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 41 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
python3
Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))2024-03-11 09:36:29.601060: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-11 09:36:29.921637: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 09:36:30.793353: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>> print(tf.config.list_physical_devices('GPU'))
2024-03-11 09:36:33.878560: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 09:36:33.980099: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
nvcc: NVIDIA (R) Cuda compiler driver
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0
got it work :) first
https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------
then download Local Installer for Ubuntu22.04 x86_64 (Deb)
unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
Selecting previously unselected package libcudnn8.
(Reading database ... 47318 files and directories currently installed.)
Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ...
Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ...
Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ...
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
I visited the site
https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------
where I found an entry listed as " Local Installer for UBuntu22.04
x86_64(Deb)" which I downloaded.
Unfortunately what I got is a package named
"cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb"
which is not the same as the name you suggest in your message, which is "
libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb"
I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and
the cuda12.2_amd64.deb separately and install both.
I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work with
older versions of CUDA (12.2 or 12.3) because sooner or later
the TF team will have to produce a version with the updated version of
CUDA. IMHO, rather than us wasting time going back in versions,
the TF beak should invest time going forward to update TF to the current
CUDA version.
Thank you, Juan
On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski < ***@***.***> wrote:
got it work :) first
https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------
then download Local Installer for Ubuntu22.04 x86_64 (Deb)
<https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/>
unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
Selecting previously unselected package libcudnn8.
(Reading database ... 47318 files and directories currently installed.)
Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ...
Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ...
Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ...
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Reply to this email directly, view it on GitHub
<#63362 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU>
You are receiving this because you were mentioned.Message ID:
***@***.***>
It's just tensorflow can't see the Cuda libraries.
Instal tensorflow[and-cuda] and add this to your .bashrc or conda activation script. Adjust python version in it according to your setup.
NVIDIA_PACKAGE_DIR="$CONDA_PREFIX/lib/python3.12/site-packages/nvidia"
for dir in $NVIDIA_PACKAGE_DIR/*; do
if [ -d "$dir/lib" ]; then
export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
You won't need to install cuda or cudnn on the system. only the cuda libraries that are installed with $ pip install tensorflow[and-cuda] would be enough.
On Mon, Mar 11, 2024, 7:04 a.m. Juan E. Vargas ***@***.***>
wrote:
Hi Krzysztof
I visited the site
https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------
where I found an entry listed as " Local Installer for UBuntu22.04
x86_64(Deb)" which I downloaded.
Unfortunately what I got is a package named
"cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb"
which is not the same as the name you suggest in your message, which is "
libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb"
I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and
the cuda12.2_amd64.deb separately and install both.
I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work with
older versions of CUDA (12.2 or 12.3) because sooner or later
the TF team will have to produce a version with the updated version of
CUDA. IMHO, rather than us wasting time going back in versions,
the TF beak should invest time going forward to update TF to the current
CUDA version.
Thank you, Juan
On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski <
***@***.***> wrote:
> got it work :) first
https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------
> then download Local Installer for Ubuntu22.04 x86_64 (Deb)
https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/>
> unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
> sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
> Selecting previously unselected package libcudnn8.
> (Reading database ... 47318 files and directories currently installed.)
> Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ...
> Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ...
> Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ...
> python3 -c "import tensorflow as tf;
print(tf.config.list_physical_devices('GPU'))"
> 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN
custom operations are on. You may see slightly different numerical results
due to floating-point round-off errors from different computation orders.
To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
> 2024-03-11 10:27:47.909157: I
tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary
is optimized to use available CPU instructions in performance-critical
operations.
> To enable the following instructions: AVX2 AVX_VNNI FMA, in other
operations, rebuild TensorFlow with the appropriate compiler flags.
> 2024-03-11 10:27:48.316717: W
tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could
not find TensorRT
> 2024-03-11 10:27:48.664469: I
external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not
open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
> Your kernel may have been built without NUMA support.
> 2024-03-11 10:27:48.688059: I
external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not
open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
> Your kernel may have been built without NUMA support.
> 2024-03-11 10:27:48.688111: I
external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not
open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
> Your kernel may have been built without NUMA support.
> [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
> Reply to this email directly, view it on GitHub
#63362 (comment)>,
> or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU>
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
Reply to this email directly, view it on GitHub
<#63362 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZRPJAGFJU5ZGBGHOSUK6DTYXWTU3AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGM4TINRSGQ>
You are receiving this because you commented.Message ID:
***@***.***>
sgkouzias, shrufus, AngioInsight, sargoossens, jgzafrapalma, LinzhouLi, mocher72, charles-cai, qaqland, mobintmu, and 30 more reacted with thumbs up emoji
daniel-sali, shaflyh, and CodyAirey reacted with hooray emoji
ilya-shenbin, Jamba15, AjaniStewart, billytcl, remunata, karantai, vectorsss, lucasrct, luk27official, shaflyh, and 3 more reacted with rocket emoji
Amit30swgoh, shaflyh, and CodyAirey reacted with eyes emoji
All reactions
It's just tensorflow can't see the Cuda libraries.
Instal tensorflow[and-cuda] and add this to your .bashrc or conda
activation script
NVIDIA_PACKAGE_DIR="$CONDA_PREFIX/lib/python3.12/site-packages/nvidia"
for dir in $NVIDIA_PACKAGE_DIR/*; do
if [ -d "$dir/lib" ]; then
export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
On Mon, Mar 11, 2024, 7:04 a.m. Juan E. Vargas ***@***.***>
wrote:
> Hi Krzysztof
> I visited the site
https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------
> where I found an entry listed as " Local Installer for UBuntu22.04
> x86_64(Deb)" which I downloaded.
> Unfortunately what I got is a package named
> "cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb"
> which is not the same as the name you suggest in your message, which is
> libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb"
> I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and
> the cuda12.2_amd64.deb separately and install both.
> I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work
> older versions of CUDA (12.2 or 12.3) because sooner or later
> the TF team will have to produce a version with the updated version of
> CUDA. IMHO, rather than us wasting time going back in versions,
> the TF beak should invest time going forward to update TF to the current
> CUDA version.
> Thank you, Juan
> On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski <
> ***@***.***> wrote:
> > got it work :) first
https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------
> > then download Local Installer for Ubuntu22.04 x86_64 (Deb)
https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/>
> > unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
> > sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
> > Selecting previously unselected package libcudnn8.
> > (Reading database ... 47318 files and directories currently
installed.)
> > Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ...
> > Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ...
> > Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ...
> > python3 -c "import tensorflow as tf;
> print(tf.config.list_physical_devices('GPU'))"
> > 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN
> custom operations are on. You may see slightly different numerical
results
> due to floating-point round-off errors from different computation
orders.
> To turn them off, set the environment variable
`TF_ENABLE_ONEDNN_OPTS=0`.
> > 2024-03-11 10:27:47.909157: I
> tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow
binary
> is optimized to use available CPU instructions in performance-critical
> operations.
> > To enable the following instructions: AVX2 AVX_VNNI FMA, in other
> operations, rebuild TensorFlow with the appropriate compiler flags.
> > 2024-03-11 10:27:48.316717: W
> tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning:
Could
> not find TensorRT
> > 2024-03-11 10:27:48.664469: I
> external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could
> open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
> > Your kernel may have been built without NUMA support.
> > 2024-03-11 10:27:48.688059: I
> external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could
> open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
> > Your kernel may have been built without NUMA support.
> > 2024-03-11 10:27:48.688111: I
> external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could
> open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
> > Your kernel may have been built without NUMA support.
> > [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
> > Reply to this email directly, view it on GitHub
#63362 (comment)>,
> > or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU>
> > You are receiving this because you were mentioned.Message ID:
> > ***@***.***>
> Reply to this email directly, view it on GitHub
#63362 (comment)>,
> or unsubscribe
https://github.com/notifications/unsubscribe-auth/AZRPJAGFJU5ZGBGHOSUK6DTYXWTU3AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGM4TINRSGQ>
> You are receiving this because you commented.Message ID:
> ***@***.***>
Reply to this email directly, view it on GitHub
<#63362 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGK34PLSBZFKQ5AQKIZ7X3YXXAMRAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGYZTAMRSGY>
You are receiving this because you were mentioned.Message ID:
***@***.***>
Hi Shayan Shahrokhi
Thank you for your suggestion (adding the location of the site-packages. I
hope you would not mind if I ask :
I saw that in your suggestion the name python 3.12 is listed. Is that the
version of python that you used to test TF 2.16.1 compatibility with CUDA?
Thank you, Juan
On Mon, Mar 11, 2024 at 11:01 AM Juan Vargas ***@***.***> wrote:
will try that and will let you know. Thank you for the suggestion. Juan
On Mon, Mar 11, 2024 at 10:52 AM Shayan Shahrokhi <
***@***.***> wrote:
> It's just tensorflow can't see the Cuda libraries.
> Instal tensorflow[and-cuda] and add this to your .bashrc or conda
> activation script
> NVIDIA_PACKAGE_DIR="$CONDA_PREFIX/lib/python3.12/site-packages/nvidia"
> for dir in $NVIDIA_PACKAGE_DIR/*; do
> if [ -d "$dir/lib" ]; then
> export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
> On Mon, Mar 11, 2024, 7:04 a.m. Juan E. Vargas ***@***.***>
> wrote:
> > Hi Krzysztof
> > I visited the site
> https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------
> > where I found an entry listed as " Local Installer for UBuntu22.04
> > x86_64(Deb)" which I downloaded.
> > Unfortunately what I got is a package named
> > "cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb"
> > which is not the same as the name you suggest in your message, which is
> > libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb"
> > I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb and
> > the cuda12.2_amd64.deb separately and install both.
> > I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work
> > older versions of CUDA (12.2 or 12.3) because sooner or later
> > the TF team will have to produce a version with the updated version of
> > CUDA. IMHO, rather than us wasting time going back in versions,
> > the TF beak should invest time going forward to update TF to the
> current
> > CUDA version.
> > Thank you, Juan
> > On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski <
> > ***@***.***> wrote:
> > > got it work :) first
> https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------
> > > then download Local Installer for Ubuntu22.04 x86_64 (Deb)
> https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/>
> > > unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
> > > sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
> > > Selecting previously unselected package libcudnn8.
> > > (Reading database ... 47318 files and directories currently
> installed.)
> > > Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ...
> > > Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ...
> > > Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ...
> > > python3 -c "import tensorflow as tf;
> > print(tf.config.list_physical_devices('GPU'))"
> > > 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113]
> oneDNN
> > custom operations are on. You may see slightly different numerical
> results
> > due to floating-point round-off errors from different computation
> orders.
> > To turn them off, set the environment variable
> `TF_ENABLE_ONEDNN_OPTS=0`.
> > > 2024-03-11 10:27:47.909157: I
> > tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow
> binary
> > is optimized to use available CPU instructions in performance-critical
> > operations.
> > > To enable the following instructions: AVX2 AVX_VNNI FMA, in other
> > operations, rebuild TensorFlow with the appropriate compiler flags.
> > > 2024-03-11 10:27:48.316717: W
> > tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning:
> Could
> > not find TensorRT
> > > 2024-03-11 10:27:48.664469: I
> > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could
> > open file to read NUMA node:
> /sys/bus/pci/devices/0000:01:00.0/numa_node
> > > Your kernel may have been built without NUMA support.
> > > 2024-03-11 10:27:48.688059: I
> > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could
> > open file to read NUMA node:
> /sys/bus/pci/devices/0000:01:00.0/numa_node
> > > Your kernel may have been built without NUMA support.
> > > 2024-03-11 10:27:48.688111: I
> > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could
> > open file to read NUMA node:
> /sys/bus/pci/devices/0000:01:00.0/numa_node
> > > Your kernel may have been built without NUMA support.
> > > [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
> > > Reply to this email directly, view it on GitHub
> #63362 (comment)>,
> > > or unsubscribe
> https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU>
> > > You are receiving this because you were mentioned.Message ID:
> > > ***@***.***>
> > Reply to this email directly, view it on GitHub
> #63362 (comment)>,
> > or unsubscribe
> https://github.com/notifications/unsubscribe-auth/AZRPJAGFJU5ZGBGHOSUK6DTYXWTU3AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGM4TINRSGQ>
> > You are receiving this because you commented.Message ID:
> > ***@***.***>
> Reply to this email directly, view it on GitHub
> <#63362 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAGK34PLSBZFKQ5AQKIZ7X3YXXAMRAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGYZTAMRSGY>
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
Hi Shayan Shahrokhi
Thank you for your suggestion (adding the location of the site-packages. I
hope you would not mind if I ask :
I saw that in your suggestion the name python 3.12 is listed. Is that the
version of python that you used to test TF 2.16.1 compatibility with CUDA?
Thank you, Juan
On Mon, Mar 11, 2024 at 11:01 AM Juan Vargas ***@***.***> wrote:
> will try that and will let you know. Thank you for the suggestion. Juan
> On Mon, Mar 11, 2024 at 10:52 AM Shayan Shahrokhi <
> ***@***.***> wrote:
>> It's just tensorflow can't see the Cuda libraries.
>> Instal tensorflow[and-cuda] and add this to your .bashrc or conda
>> activation script
>> NVIDIA_PACKAGE_DIR="$CONDA_PREFIX/lib/python3.12/site-packages/nvidia"
>> for dir in $NVIDIA_PACKAGE_DIR/*; do
>> if [ -d "$dir/lib" ]; then
>> export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
>> On Mon, Mar 11, 2024, 7:04 a.m. Juan E. Vargas ***@***.***>
>> wrote:
>> > Hi Krzysztof
>> > I visited the site
https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------
>> > where I found an entry listed as " Local Installer for UBuntu22.04
>> > x86_64(Deb)" which I downloaded.
>> > Unfortunately what I got is a package named
>> > "cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb"
>> > which is not the same as the name you suggest in your message, which
>> > libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb"
>> > I assume what you meant is to get the libcudnn8_8.9.7.29*amd64.deb
>> > the cuda12.2_amd64.deb separately and install both.
>> > I have CUDA 12.4. I will not go back to trying to make TF 2.16.1 work
>> > older versions of CUDA (12.2 or 12.3) because sooner or later
>> > the TF team will have to produce a version with the updated version
>> > CUDA. IMHO, rather than us wasting time going back in versions,
>> > the TF beak should invest time going forward to update TF to the
>> current
>> > CUDA version.
>> > Thank you, Juan
>> > On Mon, Mar 11, 2024 at 5:30 AM Krzysztof Radzikowski <
>> > ***@***.***> wrote:
>> > > got it work :) first
https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------
>> > > then download Local Installer for Ubuntu22.04 x86_64 (Deb)
https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-local-repo-ubuntu2204-8.9.7.29_1.0-1_amd64.deb/>
>> > > sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
>> > > Selecting previously unselected package libcudnn8.
>> > > (Reading database ... 47318 files and directories currently
>> installed.)
>> > > Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ...
>> > > Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ...
>> > > Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ...
>> > > python3 -c "import tensorflow as tf;
>> > print(tf.config.list_physical_devices('GPU'))"
>> > > 2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113]
>> oneDNN
>> > custom operations are on. You may see slightly different numerical
>> results
>> > due to floating-point round-off errors from different computation
>> orders.
>> > To turn them off, set the environment variable
>> `TF_ENABLE_ONEDNN_OPTS=0`.
>> > > 2024-03-11 10:27:47.909157: I
>> > tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow
>> binary
>> > is optimized to use available CPU instructions in
performance-critical
>> > operations.
>> > > To enable the following instructions: AVX2 AVX_VNNI FMA, in other
>> > operations, rebuild TensorFlow with the appropriate compiler flags.
>> > > 2024-03-11 10:27:48.316717: W
>> > tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning:
>> Could
>> > not find TensorRT
>> > > 2024-03-11 10:27:48.664469: I
>> > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984]
could
>> > open file to read NUMA node:
>> /sys/bus/pci/devices/0000:01:00.0/numa_node
>> > > Your kernel may have been built without NUMA support.
>> > > 2024-03-11 10:27:48.688059: I
>> > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984]
could
>> > open file to read NUMA node:
>> /sys/bus/pci/devices/0000:01:00.0/numa_node
>> > > Your kernel may have been built without NUMA support.
>> > > 2024-03-11 10:27:48.688111: I
>> > external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984]
could
>> > open file to read NUMA node:
>> /sys/bus/pci/devices/0000:01:00.0/numa_node
>> > > Your kernel may have been built without NUMA support.
>> > > [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
>> > > Reply to this email directly, view it on GitHub
#63362 (comment)>,
>> > > or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAGK34PV2IIF5FUZ73EPKOTYXV2SZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXHE3TANRRGU>
>> > > You are receiving this because you were mentioned.Message ID:
>> > > ***@***.***>
>> > Reply to this email directly, view it on GitHub
#63362 (comment)>,
>> > or unsubscribe
https://github.com/notifications/unsubscribe-auth/AZRPJAGFJU5ZGBGHOSUK6DTYXWTU3AVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGM4TINRSGQ>
>> > You are receiving this because you commented.Message ID:
>> > ***@***.***>
>> Reply to this email directly, view it on GitHub
#63362 (comment)>,
>> or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAGK34PLSBZFKQ5AQKIZ7X3YXXAMRAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGYZTAMRSGY>
>> You are receiving this because you were mentioned.Message ID:
>> ***@***.***>
Reply to this email directly, view it on GitHub
<#63362 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZRPJACFR4YH7UCXMOEH7MTYXXEXVAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYG4YTQMZVGA>
You are receiving this because you commented.Message ID:
***@***.***>
Thanks @sh-shahrokhi. I thought it was path related. Modified slightly to make it python version independent if you put it in your conda environment activation ([environment]/etc/activate.d/env_vars.sh
).
NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
for dir in $NVIDIA_DIR/*; do
if [ -d "$dir/lib" ]; then
export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
This is not a resolution as this post install step should not be necessary.
W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
I can't seem to do similar tricks to resolve the TensorRT issues when installed similarly into the conda environment. Any ideas?
Thanks @sh-shahrokhi. I thought it was path related. Modified slightly to make it python version independent if you put it in your conda environment activation ([environment]/etc/activate.d/env_vars.sh
).
NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
for dir in $NVIDIA_DIR/*; do
if [ -d "$dir/lib" ]; then
export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
This is not a resolution as this post install step should not be necessary.
W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
I can't seem to do similar tricks to resolve the TensorRT issues when installed similarly into the conda environment. Any ideas?
I don't actually use TensorRT, but I would check if the required .so file for it is visible to tensorflow. Maybe I would need to find the name of required file in tensorflow source code.
This actually doesn't change the fact that the new tensorflow version should be tested by google team before release, or the bugs should be fixed. It seems they only care about having a working docker image, not anything else.
I have given up on TensorRT. I guess I won't be using it either.
This actually doesn't change the fact that the new tensorflow version should be tested by google team before release, or the bugs should be fixed. It seems they only care about having a working docker image, not anything else.
Agreed. Installing TF has always been hit or miss and it seems that in the many years since I last used TF that hasn't changed one bit.
Gwyki, amoschoomy, LinzhouLi, shihuojian, ChristophSchranz, hunterschuler, kobenauf, and mvggz reacted with thumbs up emoji
nightsky37 reacted with laugh emoji
All reactions
In general, we used to test RC versions before release. For example, we used to have RC0, RC1 and RC2 for TF 2.9. This gave people and downstream teams enough time to test and report issues.
It seems that 2.16.1 only had an RC0 (for 2.16.0).
The release process is (was?) like this:
cut the release branch (e.g., r2.17
)
immediately trigger the release pipeline. This would create a few PRs to update version numbers, release notes, but after this step RC0 should be as close as possible to the version on master branch at the time the release branch has been cut. There should not be any code changes to the release branch at this point (except to maybe cherrypick fixes from master from hard bugs caused by cutting the branch at a wrong commit)
have at least a week of testing for downstream teams to test RC0
get fixes to discovered bugs landed on master, cherrypick them to release branch, after they are already tested on nightly releases
trigger RC1 pipeline. Again, no other code changes should occur now, except to fix bugs discovered during building
wait a week for downstream teams to test. If there are bugs, repeat the steps above for another RC, otherwise repeat the steps above for the final version.
Overall, this process would take number_of_RCs + 1
weeks with a possibility of a few more weeks of delay.
However, for 2.16 release, although the branch was cut on Feb 8th, there has been only one RC. Most likely issues can be solved by a patch release
Following on from the post by chaudharyachint08, I did the following to automate it on venv.
I editied bin/activate
in the folder of my venv and added the two lines at the end of the file:
export NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
export LD_LIBRARY_PATH=$(echo ${NVIDIA_DIR}/*/lib/ | sed -r 's/\s+/:/g')${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Then while editing the same file, added the 2 unset lines inside the deactivate function (before the } closed curly bracket):
unset NVIDIA_DIR
unset LD_LIBRARY_PATH
I had tested it by entering the 2 lines in the terminal and my GPU was detected, so this was just the automation when the venv is activated.
MarkCButler, Wogwan, mthiboust, hilnius, inmani, kobenauf, lpsugawara, Byaidu, and lizhangtan reacted with heart emoji
CexyNature and scottix reacted with rocket emoji
All reactions
There should also be instructions for venv users.
On Mon, Apr 8, 2024, 11:48 a.m. Sotiris Gkouzias @.***>
wrote:
As I understand the issue it is clear from the discussion that users with
Linux OS and CUDA-enabled GPUs in order to utilize their GPUs should
manually perform some additional actions (namely: adjust the
LD_LIBRARY_PATH environment variable to include the directory where cuDNN
is located and locate a compatible version of ptxas in the site-packages
directory of a Python installation, under a CUDA toolkit installation path
virtual_environment/lib/python3.XX/site-packages/nvidia/cuda_nvcc/bin and
add this specific path to the environment variables). @SuryanarayanaY
https://github.com/SuryanarayanaY should that be officially
communicated as part of the procedure to pip install tensorflow[and-cuda]
for users with GPUs and Linux OS? Should it be fixed in the next versions
of TensorFlow? I rest my case.
Agree. We used to have the instructions for setting the cuda path to the
environment variable LD_LIBRARY_PATH in earlier versions. I think either
we need to add these to documentation or atleast has to add a note in pip
install guide that user has to setup the path for his own environment.
May be same instructions may not work for all environments and hence it
might have discorded. Anyways adding a note on same in the installation
guide should be must that can avoid confusions.
If anyone from here willing to contribute to add the required notes please
feel free to help. The changes can be proposed here at this doc source
https://github.com/tensorflow/docs/blob/master/site/en/install/pip.md
which will be reflected in this page
https://www.tensorflow.org/install/pip.
I created a respective pull request and it is pending review.
Reply to this email directly, view it on GitHub
#63362 (comment),
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AZRPJAEWMOXSOWF2BUOCU3DY4LJ7ZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBTGMZDINBZHA
You are receiving this because you were mentioned.Message ID:
@.***>
I totally agree. Will try to update the pull request accordingly later on.
Updated the respective pull request (pending review) yesterday. The fix was successfully tested today by @weavermech as well.
There should also be instructions for venv users.
On Mon, Apr 8, 2024, 11:48 a.m. Sotiris Gkouzias @.***>
wrote:
As I understand the issue it is clear from the discussion that users with
Linux OS and CUDA-enabled GPUs in order to utilize their GPUs should
manually perform some additional actions (namely: adjust the
LD_LIBRARY_PATH environment variable to include the directory where cuDNN
is located and locate a compatible version of ptxas in the site-packages
directory of a Python installation, under a CUDA toolkit installation path
virtual_environment/lib/python3.XX/site-packages/nvidia/cuda_nvcc/bin and
add this specific path to the environment variables). @SuryanarayanaY
https://github.com/SuryanarayanaY should that be officially
communicated as part of the procedure to pip install tensorflow[and-cuda]
for users with GPUs and Linux OS? Should it be fixed in the next versions
of TensorFlow? I rest my case.
Agree. We used to have the instructions for setting the cuda path to the
environment variable LD_LIBRARY_PATH in earlier versions. I think either
we need to add these to documentation or atleast has to add a note in pip
install guide that user has to setup the path for his own environment.
May be same instructions may not work for all environments and hence it
might have discorded. Anyways adding a note on same in the installation
guide should be must that can avoid confusions.
If anyone from here willing to contribute to add the required notes please
feel free to help. The changes can be proposed here at this doc source
https://github.com/tensorflow/docs/blob/master/site/en/install/pip.md
which will be reflected in this page
https://www.tensorflow.org/install/pip.
I created a respective pull request and it is pending review.
Reply to this email directly, view it on GitHub
#63362 (comment),
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AZRPJAEWMOXSOWF2BUOCU3DY4LJ7ZAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBTGMZDINBZHA
You are receiving this because you were mentioned.Message ID:
@.***>
I totally agree. Will try to update the pull request accordingly later on.
Updated the respective pull request (pending review) yesterday. The fix was successfully tested today by @weavermech as well.
Added instructions needed to resolve the ptxas issue.
I think the expected solution would be a new release that fixes this issue, so setting LD_LIBRARY_PATH is not needed like it is done in 2.15.1, it would be downgrade for users to do such workarounds it should just work with: pip install tensorflow[and-cuda]
@niko247 undoubtedly true. It is crystal clear that TF 2.16.1 does not work with the simple pip install tensorflow[and-cuda]
command to actually utilize CUDA locally and no relative guidelines where provided yet to resolve this.
It seems practically impossible for someone owning a PC with CUDA-enabled GPU to perform deep learning experiments with TensorFlow version 2.16.1 and utilize his GPU locally without manually performing some extra steps not included (until today) in the official TensorFlow documentation of the standard installation procedure of TensorFlow for Linux users with GPUs at least as a temporal fix! That's why I submitted the pull request in good faith and for the shake of all users as TensorFlow is "An Open Source Machine Learning Framework for Everyone".
Hope that the next patch version of TensorFlow will fix the bug as soon as possible!
I think the expected solution would be a new release that fixes this issue, so setting LD_LIBRARY_PATH is not needed like it is done in 2.15.1, it would be downgrade for users to do such workarounds it should just work with: pip install tensorflow[and-cuda]
Thanks! Downgrade to 2.15.1 works well for me too (with TF prob <=0.23).
conda create -n tmpenv python=3.11
conda activate tmpenv
pip install tensorflow[and-cuda]==2.15.1
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
pip install tensorflow-probability==0.23.0
It is unclear how a user should configure LD_LIBRARY_PATH after pip install tensorflow[and-cuda]
#65842
python tensorflow in nvidia-enabled tumbleweed and fedora distroboxes unable to talk to GPU
ublue-os/hwe#230
got it work :) first https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------
then download Local Installer for Ubuntu22.04 x86_64 (Deb)
unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
Selecting previously unselected package libcudnn8.
(Reading database ... 47318 files and directories currently installed.)
Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ...
Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ...
Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ...
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
works like a charm, thank you. :)
got it work :) first https://developer.nvidia.com/rdp/cudnn-archive?source=post_page-----bfbeb77e7c89--------------------------------
then download Local Installer for Ubuntu22.04 x86_64 (Deb)
unpack and install libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
sudo dpkg -i libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb
Selecting previously unselected package libcudnn8.
(Reading database ... 47318 files and directories currently installed.)
Preparing to unpack libcudnn8_8.9.7.29-1+cuda12.2_amd64.deb ...
Unpacking libcudnn8 (8.9.7.29-1+cuda12.2) ...
Setting up libcudnn8 (8.9.7.29-1+cuda12.2) ...
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-03-11 10:27:47.879686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-11 10:27:47.909157: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 10:27:48.316717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-03-11 10:27:48.664469: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688059: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-03-11 10:27:48.688111: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
works like a charm, thank you. :)
@Sivabooshan congrats! However note that:
@JuanVargas who raised the issue under discussion has a certain setup including CUDA version 12.4
which is not compatible with TensorFlow version 2.16.1
and that's why he might need to install TensorFlow in a virtual environment so as to avoid downgrading CUDA and a potential systemic global pollution (if you install packages globally, they clutter your main Python installation and can potentially interfere with system processes. Virtual environments protect your system-wide environment from this),
It turns out that when you pip install tensorflow[and-cuda]
all required NVIDIA libraries are installed as well. You just need to configure manually the environment variables as appropriate in order to utilize them and run TensorFlow with GPU,
Until today the officially documented TensorFlow standard installation procedure for Linux users with GPUs does not include the additional steps required to perform deep learning experiments with TensorFlow version 2.16.1
and utilize GPU locally. That's why I submitted the pull request in good faith and for the shake of all users as TensorFlow is "An Open Source Machine Learning Framework for Everyone".
Hope that the next patch version of TensorFlow will fix the bug as soon as possible!
Thanks @sh-shahrokhi. I thought it was path related. Modified slightly to make it python version independent if you put it in your conda environment activation ([environment]/etc/activate.d/env_vars.sh
).
NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
for dir in $NVIDIA_DIR/*; do
if [ -d "$dir/lib" ]; then
export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
This is not a resolution as this post install step should not be necessary.
W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
I can't seem to do similar tricks to resolve the TensorRT issues when installed similarly into the conda environment. Any ideas?
It has been very helpful! However the path I had to insert the script in was: [environment]/etc/conda/activate.d/
. See this stack overflow question.
Hello! I outlined this behavior in a duplicate ticket ( #65842 ). Torch also now installs its CUDA dependencies using the NVIDIA-managed pip packages. However, Torch doesn't appear to require the LD_LIBRARY_PATH to be set for the linker, like TF still does. I assume this is because they're manually sourcing libs from the venv. Is this functionality on the roadmap for TF?
Thank you!
I installed TensorFlow following the instructions on https://www.tensorflow.org/install/pip. However, instead of using the default command, I specified the version as 2.15.1. After completing the installation, everything works fine with GPU support.
Through my testing, it seems that TensorFlow version 2.16.1 cannot operate properly on WSL, but it works fine on a Linux host (my Linux host version is Ubuntu 22.04).
Running latest CUDA & cuDNN. 2.15.1 finds them, 2.16 does not. And neither does tf-nightly (2.17.x). How can such a large company make such buggy releases?
Team change, priority change and #63362 (comment)
@tknuutila and @mihaimaruseac I agree with you totally! IMHO, at least the Tensorflow team should suggest a temporal solution and change accordingly the pip install tensorflow[and-cuda]
guidelines.
It is surely a pity by all means! And when I read the comment below from mr. Álvaro Rodríguez I get really disappointed because I really love TensorFlow and Keras 3.0 :
I have to say, that the drop of GPU support for windows, the lack of documentation and support for cpp, the lack of support and documentation for TensorFlow lite, the lack of TFrecord multi-platform standalone libraries… and so on is simply a strategy that will kill the library long term. Except for very niche projects in large companies.
Other platforms like PyTorch are investing in easy to use multi-platform solutions. If (or when) someone actually puts a solution powerful, stable, easy to access and easy to import and export to other platforms and languages. Private enthusiasts, researchers and academics will drop TensorFlow. And don’t forget that industries rely on specialists who learned in academia and come from research.
In a time of AI revolution, where the technology is more popular than ever, and is being added to literally everything. In my opinion, TensorFlow is neglecting everything outside Python-Linux, dropping an already lacking support for interoperability, and not investing in accessibility.
I’m saying this as a researcher and professor working in a computer science lab in an university. I write this just after investing almost 100 hours trying to simply build TensorFlow-cc to add some basic capabilities to a research project for the European Union, I failed. Also the absolute lack of recent information anywhere about TensorFlow-cc, and the responses I have seen to old threads lets me know most people gave up the same way I’m ready to tell my whole team to abandon TensorFlow and try other solutions.
I have seen others in my lab commenting similar concerns and frustrations. Many of our researchers are already moving away from TensorFlow and soon the whole department will follow.
For context. The Computer Science department is the largest department in my university, and serves the most important IT faculty in the Northwest of Spain.
For us ML is thriving. In addition to the Bachelor degree in Computer Science, where ML is more than present, being the most requested in the entire university. We opened a new one in Data Science, and are opening a new one in Artificial Intelligence. Next year we will be adding new classes and teachers to be able to serve the increasing number of students in two of the three Machine Learning subjects I teach… As far as I know, none of them will learn TensorFlow, none of them are using it in their personal projects and none of them will use it their degrees. They instead will be using Julia, Matlab, OpenCv , PyTorch, Scikit-Learn and other solutions.
Which leaves me to the industry sector. I worked also as a researcher in a public hospital in a project about diabetes, and in a private research center dedicated to laser and manufacturing. They all used TensorFlow, the same my laboratory did. I have been told they are all moving away from it, currently opting for a Scikit-Learn+OpenCv and PyTorch based approach. The reason is in one case the drop of GPU support for windows, and in the other a perceived drop of support combined with lack of interoperability.
The thing is, nobody moves away from a technology they spent years using and learning unless the technology fails them. And once you move away from something because of a problem, if you find a solution somewhere else, you will probably never return.
That is what is happening in TensorFlow. Google has intentionally dropped the ball with support, documentation, accessibility, ease of use, interoperability across languages and interoperability across platforms… so others will raise to the occasion.
Simply put, TensorFlow is becoming the Bing search engine with regards to AI.
Until today the officially documented TensorFlow standard installation procedure for Linux users with GPUs does not include the additional steps required to perform deep learning experiments with TensorFlow version 2.16.1 and utilize GPU locally. That's why I submitted the pull request in good faith and for the shake of all users as TensorFlow is "An Open Source Machine Learning Framework for Everyone".
Hope that the next patch version of TensorFlow will fix the bug as soon as possible!
Agree completely with the comments regarding the frustrating situation
with tf 2.16.x . No wonder why torch is gaining the race of adoption
On Wed, May 15, 2024, 1:18 AM Sotiris Gkouzias ***@***.***> wrote:
@tknuutila <https://github.com/tknuutila> and @mihaimaruseac
<https://github.com/mihaimaruseac> I agree with you totally! IMHO, at
least the Tensorflow team should suggest a temporal solution and change
accordingly the `pip install tensorflow[and-cuda]' guidelines.
It is surely a pity by all means! And when I read the comment
<https://discuss.tensorflow.org/t/tensorflow-and-its-future/23869/3>below
from mr. Álvaro Rodríguez I get really disappointed because I really love
TensorFlow and Keras 3.0 :
I have to say, that the drop of GPU support for windows, the lack of
documentation and support for cpp, the lack of support and documentation
for TensorFlow lite, the lack of TFrecord multi-platform standalone
libraries… and so on is simply a strategy that will kill the library long
term. Except for very niche projects in large companies.
Other platforms like PyTorch are investing in easy to use multi-platform
solutions. If (or when) someone actually puts a solution powerful, stable,
easy to access and easy to import and export to other platforms and
languages. Private enthusiasts, researchers and academics will drop
TensorFlow. And don’t forget that industries rely on specialists who
learned in academia and come from research.
In a time of AI revolution, where the technology is more popular than
ever, and is being added to literally everything. In my opinion, TensorFlow
is neglecting everything outside Python-Linux, dropping an already lacking
support for interoperability, and not investing in accessibility.
I’m saying this as a researcher and professor working in a computer
science lab in an university. I write this just after investing almost 100
hours trying to simply build TensorFlow-cc to add some basic capabilities
to a research project for the European Union, I failed. Also the absolute
lack of recent information anywhere about TensorFlow-cc, and the responses
I have seen to old threads lets me know most people gave up the same way
I’m ready to tell my whole team to abandon TensorFlow and try other
solutions.
I have seen others in my lab commenting similar concerns and frustrations.
Many of our researchers are already moving away from TensorFlow and soon
the whole department will follow.
For context. The Computer Science department is the largest department in
my university, and serves the most important IT faculty in the Northwest of
Spain.
For us ML is thriving. In addition to the Bachelor degree in Computer
Science, where ML is more than present, being the most requested in the
entire university. We opened a new one in Data Science, and are opening a
new one in Artificial Intelligence. Next year we will be adding new classes
and teachers to be able to serve the increasing number of students in two
of the three Machine Learning subjects I teach… As far as I know, none of
them will learn TensorFlow, none of them are using it in their personal
projects and none of them will use it their degrees. They instead will be
using Julia, Matlab, OpenCv , PyTorch, Scikit-Learn and other solutions.
Which leaves me to the industry sector. I worked also as a researcher in a
public hospital in a project about diabetes, and in a private research
center dedicated to laser and manufacturing. They all used TensorFlow, the
same my laboratory did. I have been told they are all moving away from it,
currently opting for a Scikit-Learn+OpenCv and PyTorch based approach. The
reason is in one case the drop of GPU support for windows, and in the other
a perceived drop of support combined with lack of interoperability.
The thing is, nobody moves away from a technology they spent years using
and learning unless the technology fails them. And once you move away from
something because of a problem, if you find a solution somewhere else, you
will probably never return.
That is what is happening in TensorFlow. Google has intentionally dropped
the ball with support, documentation, accessibility, ease of use,
interoperability across languages and interoperability across platforms… so
others will raise to the occasion.
Simply put, TensorFlow is becoming the Bing search engine with regards to
Until today the officially documented TensorFlow standard installation
procedure for Linux users with GPUs *does not include the additional
steps required to perform deep learning experiments with TensorFlow version
2.16.1 and utilize GPU locally.* That's why I submitted the pull request
<tensorflow/docs#2299> in good faith and for the
shake of all users as TensorFlow is "*An Open Source Machine Learning
Framework for Everyone*".
Hope that the next patch version of TensorFlow will fix the bug as soon as
possible!
Reply to this email directly, view it on GitHub
<#63362 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGK34IMRGFJIR6CXWASBNLZCLVZFAVCNFSM6AAAAABEOPWBC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJRGYYDIOBRHA>
You are receiving this because you were mentioned.Message ID:
***@***.***>
I simply gave up trying to make it work. I have a 4090 that simply doesn't recognize it. Torch is much simpler to configure.
EDIT: The fix_gpu.sh solution worked.
@joaomh If you want to use 2.16.1 and you use conda then the easiest fix is with activated env:
mkdir -p ${CONDA_PREFIX}/etc/conda/activate.d
create file in ${CONDA_PREFIX}/etc/conda/activate.d/fix_gpu.sh with content below:
NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))
for dir in $NVIDIA_DIR/*; do
if [ -d "$dir/lib" ]; then
export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
python 3.10 or 3.12 both have failure when using TF v2.16.1 even installed with [and-cuda]
fall back to TF v2.15.1 works fine
I have the same problem. 4090, it doesnt work with 2.16, but it works with 2.15. No idea why. I am also installing with Conda
@thephet it is a path issue. You must manually configure the environment variables.
You can try the following:
Create a fresh conda virtual environment and activate it,
pip install --upgrade pip
,
pip install tensorflow[and-cuda]
,
Set environment variables:
Locate the directory for the conda environment in your terminal window by running in the terminal:
echo $CONDA_PREFIX
Enter that directory and create these subdirectories and files:
cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh
Edit ./etc/conda/activate.d/env_vars.sh
as follows:
#!/bin/sh
# Store original LD_LIBRARY_PATH
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}"
# Get the CUDNN directory
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))
# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
# Get the ptxas directory
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))
# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}
Edit ./etc/conda/deactivate.d/env_vars.sh
as follows:
#!/bin/sh
# Restore original LD_LIBRARY_PATH
export LD_LIBRARY_PATH="${ORIGINAL_LD_LIBRARY_PATH}"
# Unset environment variables
unset CUDNN_DIR
unset PTXAS_DIR
Verify the GPU setup:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
I have submitted the respective pull request to update the official TensorFlow installation guide a month ago.
I hope it helps!
绅士的咖啡 · 汇编、反汇编书籍-城通网盘资源搜索引擎-微盘 5 月前 |