[FSDP] `summon_full_params()` in computation stream (#86836)
This should help with memory usage. In particular, this allows FSDP to use caching allocator blocks from the computation stream for the `summon_full_params()` all-gathers, which should help avoid over-allocating blocks to the unshard stream.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86836
Approved by: https://github.com/rohan-varma