ShufflenetV2花卉识别模型量化过程中:在正常模式下预训练模型

开发环境:

  • MegEngine 1.2.0
  • Python 3.7
  • Jupyter Notebook

按照https://www.bilibili.com/video/BV1Zf4y1Q7aZ?p=2中第六讲进行部署进阶:推理端优化时,执行到QAT量化第二步:在正常模式下预训练模型,并在每轮迭代保存网络检查点时报错。执行代码如下:

!python3 train.py -a shufflenet_v2_x0_5 -d /home/megstudio/workspace/dataset/flowers --mode normal --save ./result/model

报错信息如下:

07 12:12:22 preparing dataset..
Traceback (most recent call last):
  File "train.py", line 599, in <module>
    main()
  File "train.py", line 364, in main
    train_proc(world_size, args)
  File "train.py", line 509, in worker
    top1.update(100 * acc1.numpy()[0], n)
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

去掉509行numpy()后的[0]试下,参考https://megengine.org.cn/api/latest/zh/api/megengine.functional.html?highlight=topk_accuracy#megengine.functional.utils.topk_accuracy

另外这个train的py文件好像不是最新的model hub 训练文件,可以升级一下model hub试试

老师,我已经按照您的建议进行了修改:

image
image

对以上两处进行了修改后报错:

image

对以上报错部分进行修改如下:

image

出现错误:

--- Logging error ---
Traceback (most recent call last):
  File "/home/megstudio/.miniconda/envs/xuan/lib/python3.7/logging/__init__.py", line 1034, in emit
    msg = self.format(record)
  File "/home/megstudio/.miniconda/envs/xuan/lib/python3.7/logging/__init__.py", line 880, in format
    return fmt.format(record)
  File "/home/megstudio/.miniconda/envs/xuan/lib/python3.7/site-packages/megengine/logger.py", line 98, in format
    formatted = super(MegEngineLogFormatter, self).format(record)
  File "/home/megstudio/.miniconda/envs/xuan/lib/python3.7/logging/__init__.py", line 619, in format
    record.message = record.getMessage()
  File "/home/megstudio/.miniconda/envs/xuan/lib/python3.7/logging/__init__.py", line 380, in getMessage
    msg = msg % self.args
  File "train.py", line 599, in __str__
    return fmtstr.format(**self.__dict__)
TypeError: unsupported format string passed to numpy.ndarray.__format__
Call stack:
  File "train.py", line 603, in <module>
    main()
  File "train.py", line 368, in main
    train_proc(world_size, args)
  File "train.py", line 527, in worker
    total_time,
Message: 'TRAIN e%d %06d %f %s %s %s %s'
Arguments: (0, 0, 0.0625, <__main__.AverageMeter object at 0x7f5a4cf21358>, <__main__.AverageMeter object at 0x7f5a4cf21390>, <__main__.AverageMeter object at 0x7f5a4cf213c8>, <__main__.AverageMeter object at 0x7f5a4cf21400>)

以上是出现的问题,老师您看看问题出现在那里了

n = image.shape
我理解你是要获得一个整形的batch_size,但image.shape好像不是标量,你可以print一下看看。如果n是标量,那update里是不是也不用加np.arrray了。

老师,那这步怎么解决,我没有找到合适的方法
箭头所指部分都将[0]删除:
image
会有以下报错:
image
针对这个问题我该怎么解决

如果print(n)不是一个标量,可以试试n = image.shape[0]

好的,谢谢老师!