RuntimeError: expect quantize dtype

开发环境:

MegEngine 1.2.0
Python 3.7.9
centos8
按照https://megengine.org.cn/doc/advanced/quantization.html#id5文档-进阶学习-量化进行本模型量化时,最后一步 :将模型转换为量化模型,并执行 Dump 用于后续模型部署时报错,报错代码如下

Traceback (most recent call last):
  File "inference_MCNN_quantize.py", line 61, in <module>
    et_dmap = infer_func(img)
  File "/opt/Anaconda3/envs/MegEngine/lib/python3.7/site-packages/megengine/jit/tracing.py", line 644, in __call__
    outputs = self.__wrapped__(*args, **kwargs)
  File "inference_MCNN_quantize.py", line 48, in infer_func
    et_dmap = mcnn(img)
  File "/opt/Anaconda3/envs/MegEngine/lib/python3.7/site-packages/megengine/module/module.py", line 113, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/home/housiyue/myMegEngine/MCNN_quantize/model_MCNN_quantize.py", line 78, in forward
    x1 = self.branch1(x)
  File "/opt/Anaconda3/envs/MegEngine/lib/python3.7/site-packages/megengine/module/module.py", line 113, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/opt/Anaconda3/envs/MegEngine/lib/python3.7/site-packages/megengine/module/sequential.py", line 96, in forward
    inp = layer(inp)
  File "/opt/Anaconda3/envs/MegEngine/lib/python3.7/site-packages/megengine/module/quantized/module.py", line 23, in __call__
    return super().__call__(*inputs, **kwargs)
  File "/opt/Anaconda3/envs/MegEngine/lib/python3.7/site-packages/megengine/module/module.py", line 113, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/opt/Anaconda3/envs/MegEngine/lib/python3.7/site-packages/megengine/module/quantized/conv.py", line 108, in forward
    return self.calc_conv_quantized(inp, nonlinear_mode="RELU")
  File "/opt/Anaconda3/envs/MegEngine/lib/python3.7/site-packages/megengine/module/quantized/conv.py", line 57, in calc_conv_quantized
    w_scale = dtype.get_scale(self.weight.dtype)
RuntimeError: expect quantize dtype

请问老师出现这样的错误可能是什么原因造成的呢?

跑quantized模型需要quantized dtype,你可以尝试用core.dtype里的qint8等方法拿到quantized dtype或者使用quantization.quantize.quantize(Module)函数将网络从QAT模型转成quantized模型

我怀疑他在调用quantize之后偷偷把weight改了 :rofl:

老师您好,理论上将模型由float32量化为int8之后,得到的量化前后权重文件大小之比应当为1:4,但是本模型量化后出现约1:2的关系,请问出现这样的现象有可能是什么导致的呢?(下图为本模型量化前后的权重文件大小)
image

单纯看权重本身确实应该就是1:4的关系,所以你得确认一下你网络中是否全部的权重都是int8,可以load之后看看这个dict里的key是否符合预期(通过Module.state_dict()默认得到的就是一个存numpy数据的字典)

老师,我打印出量化之后网络的权重,发现仍然还是float32,量化模型没有成功,请问有可能是什么因素导致的?在量化本模型的时候,是按照 ①基于Fused Module 修改网络结构;②在正常模式下预训练模型,并在每轮迭代保存网络检查点;③QAT 模式进行模型 Fine - tuning;④将模型转换为量化模型,并执行 Dump 用于后续模型部署这四个步骤进行的,同时也参照了官网-文档-进阶学习-量化部分的教程。我现在量化的模型是一个计数模型(不是分类模型),网络中涉及到多列卷积,请问在量化的时候是否有何特殊的处理?(比如数据转换接口QuantStub和DequantStub的位置添加规则是否有什么不同)

唔,你在将量化模型dump前,把权重的dtype输出看一下呗?

老师,以下就是在dump前,finetune后的部分权重:

layer: model param: OrderedDict([('branch1.0.bias', array([[[[-2.8180127e-04]],

        [[-1.4039382e-05]],...
 [[ 1.3546302e-04]],

        [[-5.9968247e-06]]]], dtype=float32)), ('branch1.0.weight', array([[[[ 1.26116043e-02,  3.40525550e-03,  1.07913306e-02, ...,
           9.79223289e-03,  1.07813142e-02,  2.03321334e-02],
         [ 1.29145570e-02,  1.41455326e-02,  5.27650397e-03, ...,
           5.86509705e-03,  1.22995209e-02,  6.93913875e-03],
  ...,
         [ 8.66396818e-03, -4.96567134e-03, -6.07786747e-03, ...,
           9.63109545e-03, -5.97792538e-03, -2.88965832e-03],
         [ 5.57629392e-04,  8.24190490e-03,  1.17016374e-03, ...,
          -8.97644740e-03,  1.15242386e-02, -1.01703452e-02],
         [ 2.73346319e-03,  6.39193552e-03, -3.78952827e-03, ...,
          -2.41790363e-03,  4.95738117e-03,  9.51081631e-04]]]],
      dtype=float32)), ('branch1.0.act_observer.max_val', array(90.979866, dtype=float32)), ('branch1.0.act_observer.min_val', array(0., dtype=float32)), ('branch1.0.act_observer.momentum', array(0.9, dtype=float32)), ('branch1.0.weight_observer.max_val', array(0.036286, dtype=float32)), ('branch1.0.weight_observer.min_val', array(-0.04819379, dtype=float32)), ('branch1.2.bias', array([[[[-2.5219109e-04]],

        [[ 1.5335076e-04]],...
 [[-1.1871278e-04]],

        [[-2.4559462e-04]]]], dtype=float32)), ('branch1.2.weight', array([[[[-2.18509883e-03, -1.28723327e-02,  5.16967895e-03, ...,
          -5.16568264e-03, -1.25880130e-02,  1.15498202e-03],
         [ 5.12751099e-03, -9.89115145e-03, -2.21782159e-02, ...
...,
         [-8.75598565e-03, -1.07884007e-02,  3.22100270e-04, ...,
          -9.99546144e-03,  5.15036890e-03,  8.65229219e-03],
         [-8.10293667e-03,  4.64938115e-03,  5.81022725e-03, ...,
           8.19846522e-03,  6.83111418e-03, -6.10522402e-04],
         [-2.90398137e-03, -1.94538590e-02, -1.88999549e-02, ...,
           7.95712043e-03,  1.09496794e-03,  3.74141993e-04]]]],
      dtype=float32)), ('branch1.2.act_observer.max_val', array(12.260272, dtype=float32)), ('branch1.2.act_observer.min_val', array(0., dtype=float32)), ('branch1.2.act_observer.momentum', array(0.9, dtype=float32)), ('branch1.2.weight_observer.max_val', array(0.0432395, dtype=float32)), ('branch1.2.weight_observer.min_val', array(-0.03958283, dtype=float32)), ('branch1.4.bias', array([[[[-0.00104612]]

以下是执行quantize()量化模型之后的部分权重:

OrderedDict([('branch1.0.bias', array([[[[-2.8180127e-04]],

        [[-1.4039382e-05]]...

           [[-3.3417422e-05]],

        [[ 1.3546302e-04]],

        [[-5.9968247e-06]]]], dtype=float32)), ('branch1.0.weight', array([[[[ 1.26116043e-02,  3.40525550e-03,  1.07913306e-02, ...,
       
         [ 8.66396818e-03, -4.96567134e-03, -6.07786747e-03, ...,
           9.63109545e-03, -5.97792538e-03, -2.88965832e-03],
         [ 5.57629392e-04,  8.24190490e-03,  1.17016374e-03, ...,
          -8.97644740e-03,  1.15242386e-02, -1.01703452e-02],
         [ 2.73346319e-03,  6.39193552e-03, -3.78952827e-03, ...,
          -2.41790363e-03,  4.95738117e-03,  9.51081631e-04]]]],
      dtype=float32)), ('branch1.2.bias', array([[[[-2.5219109e-04]],

        [[ 1.5335076e-04]]...

        [[-1.1871278e-04]],

        [[-2.4559462e-04]]]], dtype=float32)), ('branch1.2.weight', array([[[[-2.18509883e-03, -1.28723327e-02,  5.16967895e-03, ...,
          -5.16568264e-03, -1.25880130e-02,  1.15498202e-03],
         [ 5.12751099e-03, -9.89115145e-03, -2.21782159e-02, ...,
        
         [-8.75598565e-03, -1.07884007e-02,  3.22100270e-04, ...,
          -9.99546144e-03,  5.15036890e-03,  8.65229219e-03],
         [-8.10293667e-03,  4.64938115e-03,  5.81022725e-03, ...,
           8.19846522e-03,  6.83111418e-03, -6.10522402e-04],
         [-2.90398137e-03, -1.94538590e-02, -1.88999549e-02, ...,
           7.95712043e-03,  1.09496794e-03,  3.74141993e-04]]]],
      dtype=float32)), ('branch1.4.bias', array([[[[-0.00104612]]...

这个quantize之后的权重明显不对呀,你看dtype还是float32的,你需要再确认一下quantize_qat之后Module有没有变成QATModule,再确认quantize之后Module有没有变成QuantizedModule,可以选一个ConvBn来看,用 isinstanceof 来看

老师,请问是 isinstanceof()还是 isinstance()?我查了一下python里没有isinstanceof(),难道是需要自定义吗?不太明白具体怎么看module是否转换成功,能不能举个例子呢

isinstance

比如你最早的网络叫net,QAT网络qat_net = quantize_qat(net, ...),Quantized网络quantized_net = quantize(qat_net, ...)

假如net里有个convbn,net.convbn0,那你确认一下isinstance(qat_net.convbn0, QAT.ConvBnisinstance(quantized_net.convbn0, Quantized.ConvBn