LazyGraphModule: improve the fix for the FakeTensorMode mismatch issue (#119311)
The previous fix https://github.com/pytorch/pytorch/pull/118981 misses some corner cases. It works when both LazyGraphModule and compiled-autograd are enabled. But it fail with FakeTensorMode mismatch error again if LazyGraphModule+CompiledAutograd+DynamicShape are all enabled. Note that disabling any of the three does not trigger the issue.
The reason why enabling DynamicShape cause the previous fix not working is, we will call the bw_compiler here before running the backward pass if there are symints saved for backward: https://github.com/pytorch/pytorch/blob/73f0fdea5b845a09d849404b06383c329c2c5a8a/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py#L382
The bw_compiler may cause extra GraphModule recompilation on the bw_module which cause it's forward method become the lazy one again. The fix is just to delay applying the previous fix after the potential extra call of the bw_compiler.
Repro on hf_Whisper:
```
CUDA_VISIBLE_DEVICES=1 time benchmarks/dynamo/torchbench.py -dcuda --training --backend=inductor --disable-cudagraphs --accuracy --only hf_Whisper --repeat 1 --compiled-autograd --dynamic-batch-only
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119311
Approved by: https://github.com/xmfan, https://github.com/jansel