fix falcon-40b accuracy issue (#4895)
This [PR](https://github.com/microsoft/DeepSpeed/pull/4721) added the
"DecoderLayer":glmtype. It will cause the Falcon model to choose
"glmtype" fused_qkv_type. Falcon model (including Falcondecoderlayer)
needs to choose 'bloomtype' explicitly.
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>