Update ort CIs (slow, gpu, train) (#2024)
* update ort CIs
* fix train ci
* fix gpu ci
* gpus all
* devel
* enable trt
* fix
* fix
* fix
* test
* rename
* change instance
* test
* use available
* update
* shorter labels as well
* add onnxruntime-traning
* fix onnxruntime package checking
* fix typo
* fix typo
* remove torch version
* fix trainer
* fixed trt ep by using trt docker image (the only way to make sure everything works)
* latest trt version
* remove pkv speedup timing since never used
* trust remote code for training datasets
* remove rocm from diffusers tests
* move ort training tests to onnxruntime-training
* fix ort training
* fix
* style
* always assert closenes and not equality
* fixed perceiver
* fixed missing position ids when attn mask is given
* remove num_labels from output shapes as it's not a dynamic axis
* raise error on missing mandatory inputs
* added atol and rtol as part of the ORTModelTestMixin class
* fix segformer image segmentation
* style
* fix vision encoder io binding
* hot fix io binding, remove its dependency to the order of inputs and make sure it's actually being tested
* fix
* typo
* unify io binding api with non io binding
* force evaluated shape to int
* mark pix2struct io binding tests
* force contiguity in forward pass
* fixed cryptic contiguity problems
* fix some
* fix vision2seq modeling and testing
* Update setup.py
* update import utils
* Update optimum/onnxruntime/modeling_ort.py
* fix vision encoder decoder io binding
* enable bigbird and bigbirg pegasus and seperate timm slow tests to untangle them
* use bigger machine for slow tests
* lower atol and rtol for image classification logits
* fix
* large
* enable more Longformer and MCTCT
* enable commented models in export as well
* uncomment timm slow models, big bird optimization and marian pkv comparison
* fix whisper/speech_to_text test and make convolution deterministic
* pin torch for ort training
* ctc and speech also uses convolution so has to be deterministic
* revert vison2seq atol