[Feature] support sequence parallelism using compilation pass #16155
add reduce scatter op and register all gather
9f4dd679
replace all reduce with reduce scatter and all gather
09caae61
cascade812
marked this pull request as draft 1 year ago
match first embedding
84f4360c
update embedding replace pattern
165216d8
compile graph only for specific shapes
4318d655
clean code
abd29534
cascade812
marked this pull request as ready for review 1 year ago
add test and rename
ca7fcb15
address comments
ffb2e24f
update
46951107
pass in dtype and device
662e6988
Merge branch 'main' into sp_pass
f60a8712
enable rms_norm automatically if enable_sequence_parallelism=True
9a72e10c
add test for sq pass
552857c8
fix failed tests
629e9426
fix failed tests
1a608655
fix failed tests
0736045f
address comments
534af36d
minor fix
c16a1974
update test
82527a12
test FixFunctionalizationPass with SequenceParallelismPass
5b12ce50
remove redundant code
230ee3ce
Merge remote-tracking branch 'origin' into sp_pass
57d684d8
remove the singleton pattern to support two LLM instances.
8dc0422f
nit
b251ad52
vllm-bot
merged
690fe019
into main 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub