fix num_kv_heads sharding in autoTP for the new in-repo Falcon-40B (#4654)
to be compatible with the latest Falcon-40B's `num_kv_heads` in
https://huggingface.co/tiiuae/falcon-40b/commit/4a70170c215b36a3cce4b4253f6d0612bb7d4146

error message like:

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>