Transformers 4.48 (#2158)
* test
* testing tensor cache x)
* fix logger
* condition cache class usage
* update opset for beit and data2vec vision and skip flattened/fused pkv (e.g. gpt bigcode)
* style
* fix args patcher
* fix modernbert testing
* adaot to new whisper returned generation length
* fix is_causal in transformers
* fix modernbert failures
* style
* traceable cache
* use pkv index
* add version gard and clean up other model patcher version gards
* patch sdpa attention in optimum for now
* remove modernbert condition
* style
* fix MistralModelPatcher
* correctly patch gpt2 in vision encoder decoder
* patch sdpa attention forward everywhere
* fix gpt2 cross attention in seq2seq as well
* moved traceable cache to a file for simplicity of model patcher
* Apply suggestions from code review
* style
* fix