DeepSpeed
8c7c56a9 - Deepcompile: Fix bugs when applying deepcompile to VLA-like models (#7569)

Commit

96 days ago

Deepcompile: Fix bugs when applying deepcompile to VLA-like models (#7569) **Describe the bug** When applying deepcompile to the OpenVLA model (which is composed of two vision transformers and a llama-7B), I met the following issues: a. Not all parameters are trained, which leads to compile-time exceptions as well as incorrect invocation of `endBackward()`. b. `release_param()` can be passed a tuple, not a tensor. c. A use-before-define error in `fast_free_schedule()`. This PR attempts to fix all of those issues. Patch 1~2 resolves a, 3 resolves b and 4 resolves c. **To Reproduce the issues** Use this script: https://gist.github.com/eternalNight/3c2cf8c703f1e9e7742d3b7f9e1edae3 1. `deepspeed --num_gpus=N openvla-like.py -c` --------- Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>

References

#7569 - Deepcompile: Fix bugs when applying deepcompile to VLA-like models

Author

eternalNight

Parents

35de2030

DeepSpeed 8c7c56a9 - Deepcompile: Fix bugs when applying deepcompile to VLA-like models (#7569)

DeepSpeed
8c7c56a9 - Deepcompile: Fix bugs when applying deepcompile to VLA-like models (#7569)