models 中的分类和检测训练如果使用多进程会报错:
- 分类中可以让 dataloader 的 worker = 0 来规避,但是 dataloader 会有瓶颈。
- 检测中整个训练都是多进程的,所以只能令 ngpu = 1,使用单 gpu。
Process Process-1:
Traceback (most recent call last):
File “/usr/local/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/usr/local/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/root/megengine/Models/official/vision/detection/tools/train.py”, line 85, in worker
train_loader = iter(loader[“train”])
File “/usr/local/lib/python3.7/site-packages/megengine/data/dataloader.py”, line 122, in iter
return _ParallelDataLoaderIter(self)
File “/usr/local/lib/python3.7/site-packages/megengine/data/dataloader.py”, line 216, in init
worker.start()
File “/usr/local/lib/python3.7/multiprocessing/process.py”, line 112, in start
self._popen = self._Popen(self)
File “/usr/local/lib/python3.7/multiprocessing/context.py”, line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File “/usr/local/lib/python3.7/multiprocessing/context.py”, line 284, in _Popen
return Popen(process_obj)
File “/usr/local/lib/python3.7/multiprocessing/popen_spawn_posix.py”, line 32, in init
super().init(process_obj)
File “/usr/local/lib/python3.7/multiprocessing/popen_fork.py”, line 20, in init
self._launch(process_obj)
File “/usr/local/lib/python3.7/multiprocessing/popen_spawn_posix.py”, line 47, in _launch
reduction.dump(process_obj, fp)
File “/usr/local/lib/python3.7/multiprocessing/reduction.py”, line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can’t pickle weakref objects
Exception ignored in: <function _ParallelDataLoaderIter.del at 0x7f1411cf39d8>
Traceback (most recent call last):
File “/usr/local/lib/python3.7/site-packages/megengine/data/dataloader.py”, line 544, in del
if self.__initialized:
AttributeError: ‘_ParallelDataLoaderIter’ object has no attribute ‘_ParallelDataLoaderIter__initialized’
环境:
ubuntu16.04
python3.7
cuda10.0
cudnn7.6