Add a use_parallel_residual argument to control the residual computing way (#18695)
* Add a gpt_j_residual argument to control the residual computing way
* Put duplicate code outside of the if block
* Rename parameter "gpt_j_residual" to "use_parallel_residual" and set the default value to True