新特性:
- 同时支持动态(Imperative)、静态(Tracing) 模式的 Tensor 计算引擎,内建自动求导机制
- 实现了各类基础 Tensor 操作
- 通过装饰器 @jit.trace,Imperative 和 Tracing 模式下的代码可以高度一致
- 实现了基于 Module 的神经网络构建方式,并支持通过 save / load 持久化权重
- 在 megengine.functional 中提供了常见的计算函数
- 在 megengine.module 中提供了常见层的实现
- 提供了 X86 和 CUDA 下的高性能计算算子
- 支持 SSE4, VNNI 等指令集
- 广泛支持各种 NVIDIA GPU
- 内置测速模式实现自动算法选择
- 实现基本的数据加载机制(DataLoader)用于模型数据加载与预处理
- 实现 hub 协议支持,可以拉取在线模型和预训练模型
- 实现了 trace.dump() 对模型进行序列化,并提供 C++ 读取模型并运行的样例代码
实验性功能:
已知问题:
- 动态模式显存占用和性能尚待进一步优化
- 动态模式下的显存占用较高
- 当前实现导致动态创建的 Tensor 显存不能自动回收,目前需要使用 set_value() 方法手动复用 Tensor 来避免显存持续增加
- PyTorch 子图,megengine.Funtion 等算子占用显存会持续增加
- 静态图相关
- trace 后的函数无法继续求导
- jit.trace 和 jit.sideeffect 不支持复杂嵌套
- 性能问题
- 当前 Adam 优化器的 step 性能不佳
- 模型参数随机初始化性能不佳
下阶段计划:
- 进一步提升动态模式的显存占用和性能
- 提供对 ARM CPU 等更多后端的支持
- 提供更多算子
- 完善文档、CI 和 构建工具等周边基础设施
New Features:
- An auto-differential numerical framework for Tensors, with Imperative and Tracing modes.
- Various Tensor-based operators.
- Unified code for both Imperative and Tracing modes via @jit.trace decorator.
- Module-based APIs to build neural networks, with save/load methods for parameters.
- Common mathematical functions in megengine.functional package.
- Basic modules in megengine.module package
- High performance operators for X86 and CUDA:
- Support instruction sets such as SSE4, VNNI, etc.
- Support NVIDIA GPUs.
- Automatic kernel selection by profiling.
- A DataLoader to provide support for loading and processing data.
- A hub module for load models and pre-trained models.
- trace.dump() for module serialization, and sample code for load module and do inference.
Experimental Features:
Known Issues:
- Memory usage and performance need to be optimized for Imperative mode.
- Memory consumption might be high in the Imperative mode.
- Dynamically created Tensors in Imperative mode can not be automatically released for now. The user has to call set_value() method to reuse Tensors.
- Some operators such as PyTorchModule and megengine.Function will increasingly alloc more memory.
- Tracing related issues.
- Traced functions can not execute or perform backward().
- Multiple nested jit.trace and jit.sideeffect may result in undefined behaviors.
- Performance issues.
- Step() in Adam optimizer is slow.
- Random initialization for parameters is slow.
Next Plans:
- Improving memory consumption and speed for Imperative mode.
- Supporting more devices, e.g., ARM.
- More operators.
- Providing more docs and tools.