Reduce Memory Cost in Flux Training (#9829)
* Improve NPU performance
* Improve NPU performance
* Improve NPU performance
* Improve NPU performance
* [bugfix] bugfix for npu free memory
* [bugfix] bugfix for npu free memory
* [bugfix] bugfix for npu free memory
* Reduce memory cost for flux training process
---------
Co-authored-by: 蒋硕 <jiangshuo9@h-partners.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>