[`Gemma`] final fixes to the modeling (#29729)
* gelu_pytorch_tanh
* Force config.hidden_act to be approx gelu
* Gemma bug fixes
* force_use_exact_gelu
* Update configuration_gemma.py
* Update modeling_gemma.py
* update
* update for simpler handling
* nit
* nit
* fixpup
* update
* also update the jax modeling!
* add `"gelu_pytorch_tanh": partial(nn.gelu, approximate=True),`
* fixup
* fix order
* act vs act_fn
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>