Deepcompile: Fix bugs when applying deepcompile to VLA-like models (#7569)
**Describe the bug**
When applying deepcompile to the OpenVLA model (which is composed of two
vision transformers and a llama-7B), I met the following issues:
a. Not all parameters are trained, which leads to compile-time
exceptions as well as incorrect invocation of `endBackward()`.
b. `release_param()` can be passed a tuple, not a tensor.
c. A use-before-define error in `fast_free_schedule()`.
This PR attempts to fix all of those issues. Patch 1~2 resolves a, 3
resolves b and 4 resolves c.
**To Reproduce the issues**
Use this script:
https://gist.github.com/eternalNight/3c2cf8c703f1e9e7742d3b7f9e1edae3
1. `deepspeed --num_gpus=N openvla-like.py -c`
---------
Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>