llama.cpp
fd6ae4ca - Tensor-parallel: Fix delayed AllReduce on Gemma-4 MoE (#22129)

Commit

46 days ago

Tensor-parallel: Fix delayed AllReduce on Gemma-4 MoE (#22129) * Fix delayed AllReduce on Gemma-4 MoE Skip forward past nodes that don't consume the current one, and allow a chain of MULs. * Check for all sources before skipping nodes * Address review comments

References

#22129 - Tensor-parallel: Fix delayed AllReduce on Gemma-4 MoE

Author

gaugarg-nv

Parents

fb19f94c

llama.cpp fd6ae4ca - Tensor-parallel: Fix delayed AllReduce on Gemma-4 MoE (#22129)

llama.cpp
fd6ae4ca - Tensor-parallel: Fix delayed AllReduce on Gemma-4 MoE (#22129)