SemanticDiff

pytorch
1a28ebff - [TP] Introduce Sequence Parallel Style for Laynorm/RMSNorm/Dropout (#121295)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

197 days ago

[TP] Introduce Sequence Parallel Style for Laynorm/RMSNorm/Dropout (#121295) As titled, this PR introduces a dedicated `ParallelStyle` to shard the nn.LayerNorm/nn.Dropout/RMSNorm layers. We were mainly using a manual distribute_module calls before when sharding the RMSNorm layer, but I think we should have a dedicate TP API to easily shard those layers, instead of user manually using DTensors. I call this SequenceParallel, which might bring some confusion that we technically "deprecated" a SequenceParallel style months ago. But this time the SeuqenceParallel style is significantly different with the previous ones (which used to shard two consecutive Linear layers). I believe making it the right name is the first priority, instead of worrying about the issue of reusing the old name Pull Request resolved: https://github.com/pytorch/pytorch/pull/121295 Approved by: https://github.com/awgu, https://github.com/tianyu-l ghstack dependencies: #121294

Author

wanchaol

wanchaol

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading