[Whisper] Fix word-level timestamps with bs>1 or num_beams>1 (#28114)
* fix frames
* use smaller chunk length
* correct beam search + tentative stride
* fix whisper word timestamp in batch
* add test batch generation with return token timestamps
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* clean a test
* make style + correct typo
* write clearer comments
* explain test in comment
---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>