Fixing inference api for FP32 and non-masking GPT-based models (#1204)
* fixing inference api for FP32 and non-masking GPT-based models
* use a dummy tensor if input_mask is none
* fix input_mask
* minor fix
* send input_mask to compute_attn func for checking