Refine logic and error handling based on feedback
- Added a RuntimeError to prevent direct calls to loss.backward() when ZenFlow is enabled, ensuring proper management of the backward pass.
- Updated position of loss scale block.
Signed-off-by: Tingfeng Lan <erc8gx@virginia.edu>