Fix GA loss bugs and add unit test (#35121)
* fix GA bugs and add unit test
* narrow down model loss unit test diff gap
* format code to make ruff happy
* send num_items_in_batch argument to decoder
* fix GA loss bug in BertLMHeadModel
* use TinyStories-33M to narrow down diff gap
* fotmat code
* missing .config
* avoid add extra args
---------
Co-authored-by: kangsheng <kangsheng@meituan.com>