onnxruntime
9c24416d - 1. add index for all collective op ( to make sure they are running in the right order, instead of hanging there). 2. make megatron transformer only run once (to make sure the collective op is inserted with index increasing as topology order). 3. zero + mega 128 GPUs run fix, when no weight grad should be all-reduced, the grad_norm inputs are empty. so we pass zero.

Commit
5 years ago
1. add index for all collective op ( to make sure they are running in the right order, instead of hanging there). 2. make megatron transformer only run once (to make sure the collective op is inserted with index increasing as topology order). 3. zero + mega 128 GPUs run fix, when no weight grad should be all-reduced, the grad_norm inputs are empty. so we pass zero.
Author
Parents
Loading