Add num_head param to native multihead attention to evade dim_per_head always 64 bug (#72375)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72375
Current MHA does not have num_head as param, add it
Test Plan: In the following diff
Reviewed By: swolchok
Differential Revision: D33972168
fbshipit-source-id: 6b31bd6a516354d781e6dd5eea347a31d6cea272
(cherry picked from commit 3d0706628d47339e8279b0a46b26dc740c967050)