初步确认为未映射 gpu 导致,但开启 gpu 映射 --gpus=all 后,在镜像中 nvidia-smi 可得到
nvidia-smi
Tue Sep 14 07:01:11 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51 Driver Version: 450.51 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 107... Off | 00000000:04:00.0 Off | N/A |
| 27% 26C P8 5W / 180W | 2MiB / 8119MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 107... Off | 00000000:08:00.0 Off | N/A |
| 27% 26C P8 5W / 180W | 2MiB / 8119MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 107... Off | 00000000:86:00.0 Off | N/A |
| 27% 24C P8 5W / 180W | 2MiB / 8119MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 107... Off | 00000000:8A:00.0 Off | N/A |
| 28% 26C P8 5W / 180W | 2MiB / 8119MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
输出,但加载 gpu 模型时报错
RuntimeError: assertion `locator.device >= 0 && locator.device < nr_gpu' failed at ../../../../../../src/core/impl/comp_node/cuda/comp_node.cpp:831: static mgb::CompNode::Impl* mgb::CudaCompNode::load_cuda(const mgb::CompNode::Locator&, const mgb::CompNode::Locator&)
extra message: request gpu0 out of valid range [0, 0)