MegEngine v0.3.1: Hello World!

Discussy · 2020年03月25日14:55

新特性：

同时支持动态（Imperative）、静态（Tracing）模式的 Tensor 计算引擎，内建自动求导机制
- 实现了各类基础 Tensor 操作
- 通过装饰器 @jit.trace，Imperative 和 Tracing 模式下的代码可以高度一致
实现了基于 Module 的神经网络构建方式，并支持通过 save / load 持久化权重
- 在 megengine.functional 中提供了常见的计算函数
- 在 megengine.module 中提供了常见层的实现
提供了 X86 和 CUDA 下的高性能计算算子
- 支持 SSE4, VNNI 等指令集
- 广泛支持各种 NVIDIA GPU
- 内置测速模式实现自动算法选择
实现基本的数据加载机制（DataLoader）用于模型数据加载与预处理
实现 hub 协议支持，可以拉取在线模型和预训练模型
实现了 trace.dump() 对模型进行序列化，并提供 C++ 读取模型并运行的样例代码

动态模式显存占用和性能尚待进一步优化
- 动态模式下的显存占用较高
- 当前实现导致动态创建的 Tensor 显存不能自动回收，目前需要使用 set_value() 方法手动复用 Tensor 来避免显存持续增加
- PyTorch 子图，megengine.Funtion 等算子占用显存会持续增加
静态图相关
- trace 后的函数无法继续求导
- jit.trace 和 jit.sideeffect 不支持复杂嵌套
性能问题
- 当前 Adam 优化器的 step 性能不佳
- 模型参数随机初始化性能不佳

An auto-differential numerical framework for Tensors, with Imperative and Tracing modes.
- Various Tensor-based operators.
- Unified code for both Imperative and Tracing modes via @jit.trace decorator.
Module-based APIs to build neural networks, with save/load methods for parameters.
- Common mathematical functions in megengine.functional package.
- Basic modules in megengine.module package
High performance operators for X86 and CUDA:
- Support instruction sets such as SSE4, VNNI, etc.
- Support NVIDIA GPUs.
- Automatic kernel selection by profiling.
A DataLoader to provide support for loading and processing data.
A hub module for load models and pre-trained models.
trace.dump() for module serialization, and sample code for load module and do inference.

Memory usage and performance need to be optimized for Imperative mode.
- Memory consumption might be high in the Imperative mode.
- Dynamically created Tensors in Imperative mode can not be automatically released for now. The user has to call set_value() method to reuse Tensors.
- Some operators such as PyTorchModule and megengine.Function will increasingly alloc more memory.
Tracing related issues.
- Traced functions can not execute or perform backward().
- Multiple nested jit.trace and jit.sideeffect may result in undefined behaviors.
Performance issues.
- Step() in Adam optimizer is slow.
- Random initialization for parameters is slow.