Fix batched inference/generation, position_ids creation, falcon alibi, gpt_bigcode multi-query,.. (#2326)
* test left-padded batched inference
* demonstrate batched tex generation failure
* fix remote code
* fix
* fix position_ids generation inside ORTModelForCausalLM class
* it works until transformers 4.52 -_-
* now run with latest transformers
* bolean 4D mask is actually not supported by torch onnx exporter
* only test generation with batched inputs, for logits are a bit off because of transformers using boolean mask
* boolean mask safe softmax batched inference
* style
* use old typing
* don't do unnecessary patching
* try to avoid spamming the hub for an image
* update min transformers version
* better and direct torch patching
* more batched generation special cases
* style
* initialize the il image instead of downloading it
* use random pil image
* test different versions of transformers in fast tests
* fix
* revert diffusers changes for now
* mask padding kv cache as well
* fix masking for old bloom
* use constant image to image loading errors
* style
* test diffusers in series to avoid runner dying
* fix
* cleanup and some comments
* fix and test falcon alibi
* style
* fix, support and test multi_query=False as well
* only apply masked testing for transformers version previous to 4.39
* Update optimum/onnxruntime/modeling_decoder.py
* use text decoder position ids onnx config but test its sync with list
* fix opt
* style
* fix sdpa without overriting torch onnx exporter
* use inplace op ;-;
* fix st test
* patch directly in onnx because patch needs to happen after softmax