You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After closely following the Guide on the official System 76 website for installing Tensorman, I can't seem to get tensorflow to recognize my GPU (Rtx 2080).
and this is the output of $ tensorman run --gpu python3 -- ./test_tf.py :
"docker" "run" "-u" "1000:1000" "--gpus=all" "-e" "HOME=/project" "-it" "--rm" "-v" "/home/matteo/Desktop/Tirocinio/HouseExpoSLAM/pseudoslam/personal_experiments:/project" "-w" "/project" "tensorflow/tensorflow:latest-gpu" "python3" "./test_tf.py"
2023-03-28 11:21:33.267329: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-28 11:21:34.303425: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2023-03-28 11:21:34.303451: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: 2ee87ebed972
2023-03-28 11:21:34.303459: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: 2ee87ebed972
2023-03-28 11:21:34.303486: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: NOT_FOUND: was unable to find libcuda.so DSO loaded into this program
2023-03-28 11:21:34.303509: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: 525.89.2
Num GPUs Available: 0
I've also tried to launch the python script from a container shell $ tensorman run --gpu --python3 bash but received the same error.
Some information for context:
Output of $ nvidia-smi :
Tue Mar 28 13:39:08 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 0% 35C P8 16W / 260W | 341MiB / 8192MiB | 4% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2509 G /usr/lib/xorg/Xorg 84MiB |
| 0 N/A N/A 2640 G /usr/bin/gnome-shell 126MiB |
| 0 N/A N/A 4652 G firefox 127MiB |
+-----------------------------------------------------------------------------+
Output of $ tensorman run --gpu nvidia-smi:
Failed to initialize NVML: Unknown Error
I have a dual-boot with grub as my bootloader, knowing that it may interfere with the cgroups kernel parameter, I have updated its config file to include the necessary option, as suggested in this comment. Here is the output of $ cat /proc/cmdline :
After closely following the Guide on the official System 76 website for installing Tensorman, I can't seem to get tensorflow to recognize my GPU (Rtx 2080).
Here is the code used to test it:
and this is the output of
$ tensorman run --gpu python3 -- ./test_tf.py
:I've also tried to launch the python script from a container shell
$ tensorman run --gpu --python3 bash
but received the same error.Some information for context:
Output of
$ nvidia-smi
:Output of
$ tensorman run --gpu nvidia-smi
:I have a dual-boot with grub as my bootloader, knowing that it may interfere with the cgroups kernel parameter, I have updated its config file to include the necessary option, as suggested in this comment. Here is the output of
$ cat /proc/cmdline
:I have no previous experience in installing TF / Tensorman and in using containers, hope I didn't miss some crucial details in the process.
The text was updated successfully, but these errors were encountered: