[spmd compile api] run gm_transforms before running the first iteration (#98788)
Summary: The non-transformed graph module contains functionalized optimizer which, in a memory constraint environment, needs to be defunctionalized (via fx transformation or lowering to Inductor) before running the first iteration. Otherwise OOM may occur.
Test Plan: Manually tested.
Reviewed By: mrshenli
Differential Revision: D44843942
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98788
Approved by: https://github.com/mrshenli