SemanticDiff

pytorch
4cf6d117 - [FSDP2] Used `ReduceOp.AVG` if fp32 reduce-scatter (#120919)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

202 days ago

[FSDP2] Used `ReduceOp.AVG` if fp32 reduce-scatter (#120919) This PR uses `ncclAvg` op (via `ReduceOp.AVG`) if doing fp32 reduce-scatter. This allows the division by world size to happen in the reduce-scatter kernel itself, which seems to save extra memory read/write for dividing. This yields ~1.5% speedup on the Llama-7B workload (and makes per-parameter FSDP faster than flat-parameter FSDP 😅 ). Pull Request resolved: https://github.com/pytorch/pytorch/pull/120919 Approved by: https://github.com/yifuwang, https://github.com/wanchaol ghstack dependencies: #120238, #120910

Author

awgu

awgu

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading