MegEngine v1.4.0-rc1 Release Notes

版本发布 Release

huahua404-MegEngine 2021年04月23日03:53 #1

Highlights

增加动态图下通过重计算优化显存使用的功能。增加 2 行代码，即可在相同显存情况下，训练 3 倍大的模型。
尝鲜体验通道：
pip3 install megengine==1.4.0rc1 -f https://megengine.org.cn/whl/mge.html

问题修复

通用组件

修复设置 no-profiling-on-shape-change 之后，MatMul 依然搜参问题。
修复 const tensor 缓存导致的越训越慢问题。

CUDA

修复销毁 MegEngine cuda 和 cuDNN 的顺序问题。
修复 CUTLASS GEMM 奔溃问题，增加了 block size 限制。
修复 TensorRT runtime opr profiling 功能。

Python API

修复 optimizer 的 state_dict 带来的副作用。
修复 trace 中的 gopt level。

量化

修复 Quantized.Concat 的 forward。
修复 easy quant 中的 zero scale。

周边工具

修复 TensorBoard 中的节点显示。
修复 Module 扩展结构时的自动命名。
修复 module stats 中对 group conv 的 FLOPs 计算。

新功能

通用组件

dnn 默认开启 log ，并打印 error 信息，并提供用户设置 log level 的接口。

ARM

arm上默认打开 dot 支持，并兼容在不支持 8.2 指令集的机器上运行。

CUDA

增加 CUDA compnode 直接获取内存相关信息。

Python API

增加动态图下通过重计算优化显存使用的功能。
增加 AdamW 优化器。
增加 varnode 的 array 方法。

周边工具

对 NetworkNode 增加 repr 方法。
增加 opgraph 的 optimize-for-inference 接口。
增加 module_stats 和 net_visualize 的总结输出。
增加 NetWorkNode 对 receptive_field 的统计量。
设置 network_visualize 的 log_path 为可选参数。

改进

周边工具

优化算子的自动命名规则。

其他说明

通用组件

重构 CPU CompNode，使 default_cpu 不支持 record。
使用 algo attitude 替换 algo reproducible 属性。

Python API

移动 nvof 到 vision，同时兼容原有用法。

Highlights

Dynamic Tensor Rematerialization [Kirisame et al., 2021] is implemented in MegEngine. With two more lines of code, you can train a model twice larger given the same memory budget.
Welcome to try it out through :
pip3 install megengine==1.4.0rc1 -f https://megengine.org.cn/whl/mge.html

Bug Fixes

General components

Fix MatMul opr tuning bug when setting no-profiling-on-shape-change.
Fix const tensor cache.

CUDA

Fix the destroying order of cudnn and cuda in MegEngine.
Fix the cutlass gemm crash by limiting the block size.
Fix TensorRT runtime opr profiling.

Python API

Fix bugs in optimizer’s state_dict.
Fix gopt level in trace.

Quantization

Fix quantized concat forward.
Fix zero scale bug of easy quant.

Tools

Fix node display bug in tensorboard.
Fix auto naming bug when expanding structure.
Fix module stats calculate flops bug for group conv and remove model status change.

New Features

General components

DNN turns on log by default, prints error information, and provides an interface for users to set the log level.

ARM

dot is turned on by default on ARM, and it is compatible to run on machines that do not support the 8.2 instruction set.

CUDA

Enable CUDA compnode directly obtain memory related information.

Python API

Add dynamic tensor rematerialization.
Add AdamW optimzer.
Add array method for varnode.

Tools

Add repr for NetworkNode.
Add optimize-for-inference interface for opgraph.
Add summary print for module_stats and network_visualize.
Add support of receptive_field stats for NetworkNode.
Set network_visualize’s log_path as an optional flag.

Improvements

Tools

Optimize the op’s auto naming rules.

Others

Tools

Refactored CPU compnode so that default_cpu does not support record.
replace algo reproducible attribute with algo attributes.

Python API

move nvof to vision, compatible with old usage.