DeepSpeed
f15cccfa - [AutoTP] Make AutoTP work when num_heads not divisible by number of workers (#4011)

Commit
2 years ago
[AutoTP] Make AutoTP work when num_heads not divisible by number of workers (#4011) * allow number of heads not divisible by number of ranks * get num_heads from model config, more robust * simplify logic where num_head itself is sharded * name tweaks * make code more robust where num_attention_heads may not be defined in model_config * support num_key_value_heads < num_attention_heads which is used by llama2 * add test for 5 ranks * change odd rank # to 3 to avoid test skip * add get_shard_size function * modify sharding mechanism according to latest auto TP * fix accuracy issue * fix format * skip tests with fusedqkv * remove skip of fusedqkv tests * skip test fusedqkv with odd number of ranks * support model with n_heads in model_config * fix TestInjectionPolicy::test[fp32-t5] * fix uneven_heads on some fusedqkv types (#12) * odd support fusedqkv * fix format and clear text * better fix when activation size cannot be divided by number of heads * move tp_shard.py under module_inject * Add get_num_kv_heads in tp_shard.py * Refine according to comments * remove old comment * fix bug in getting num_kv_heads * support uneven sharding of lm_head tensor parallel --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Molly Smith <112220543+molly-smith@users.noreply.github.com> Co-authored-by: mzl <mingzhi.liu@intel.com> Co-authored-by: Michael Wyatt <mrwyattii@gmail.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Author
Parents
Loading