Add `scale_attn_by_inverse_layer_idx` feature (#2486)
* Add scale_attn_by_inverse_layer_idx feature
* Fix layer_id bug
* Fix scaling value
Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>