Graph-break on FSDP in dynamo (#87420)
Why we want to graph-break FSDP
- FSDP has communication ops during forward and backward which we currently can't trace into the graph but also want to ensure are overlapped with compute
- dynamo has issues tracing into or capturing a call to fsdp module without a break (see below)
How we graph-break on FSDP
- marking FSDP.forward code as skip means the code frames will graph-break; but in this case all of torch.* is listed in skipfiles.py anyway, so this is taken care of
- disallowing the FSDP module prevents dynamo trying to record a 'call_module(FSDPmodule)' node into a graph, which happens earlier than the graphbreak that would be caused by skip, and causes additional issues: dynamo deepcopies modules before call-module handling, and FSDP module isn't trivially deep-copyable
cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87420
Approved by: https://github.com/aazzolini