fix num_kv_heads sharding in uneven autoTP for Falcon-40b (#4712)
Falcon-40b will fail on uneven autotp. Need to add 'num_kv_heads' in the
kv_head_names list.
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>