Add speecht5 batch generation and fix wrong attention mask when padding (#25943)
* fix speecht5 wrong attention mask when padding
* enable batch generation and add parameter attention_mask
* fix doc
* fix format
* batch postnet inputs, return batched lengths, and consistent to old api
* fix format
* fix format
* fix the format
* fix doc-builder error
* add test, cross attention and docstring
* optimize code based on reviews
* docbuild
* refine
* not skip slow test
* add consistent dropout for batching
* loose atol
* add another test regarding to the consistency of vocoder
* fix format
* refactor
* add return_concrete_lengths as parameter for consistency w/wo batching
* fix review issues
* fix cross_attention issue