Word-level timestamps broken for short-form audio (#30325)
* force chunk_length_s in AutomaticSpeechRecognitionPipeline
* compute num_frames even when stride is None
* add slow tests
* fix test
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/pipelines/test_pipelines_automatic_speech_recognition.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add input validation
* fixup
* small fix
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>