pytorch
a72190fd - make nanogpt work with both compiled autograd and _LazyGraphModule (#118981)

Commit View On GitHub

Commit

231 days ago

make nanogpt work with both compiled autograd and _LazyGraphModule (#118981) @xmfan and @fegin reported that _LazyGraphModule ( https://github.com/pytorch/pytorch/pull/117911 ) makes nanogpt training fail with compiled autograd. We have a repro: ``` python benchmarks/dynamo/torchbench.py --training --backend=inductor --disable-cudagraphs --accuracy --only nanogpt --repeat 1 --compiled-autograd ``` but it's still mysterious how to trigger the issue with a toy model. The error message for the failure is https://gist.github.com/shunting314/6402a6388b3539956090b6bc098952fb . In compile_fx we will call `detect_fake_mode`. This function will look for an active FakeTensorMode from both TracingContext and example inputs. The error is triggered because we find different FakeTensorMode from these 2 sources. Although I don't know what really causes the discrepancy of FakeTensorMode above, the fix here is to force _LazyGraphModule recompilation if we have compiled autograd enabled. This does not hurt compilation time most of the time because we anyway will call the graph module here in the backward pass when compiled autograd is enabled: https://github.com/pytorch/pytorch/blob/855d5f144efc1db50316b9fcad1e62bf37caed10/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py#L705 Let me know if we can have a better fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118981 Approved by: https://github.com/jansel

Author

shunting314

Committer

pytorchmergebot

Parents

d670dfb7

pytorch a72190fd - make nanogpt work with both compiled autograd and _LazyGraphModule (#118981)

Commit

pytorch
a72190fd - make nanogpt work with both compiled autograd and _LazyGraphModule (#118981)