Add numpy alternative to FE using torchaudio (#26339)
* add audio_utils usage in the FE of SpeechToText
* clean unecessary parameters of AudioSpectrogramTransformer FE
* add audio_utils usage in AST
* add serialization tests and function to FEs
* make style
* remove use_torchaudio and move to_dict to FE
* test audio_utils usage
* make style and fix import (remove torchaudio dependency import)
* fix torch dependency for jax and tensor tests
* fix typo
* clean tests with suggestions
* add lines to test if is_speech_availble is False