Introduce ORTSessionMixin and enable general io binding (works for diffusers as well) (#2234)
* refactor ort session mixin and enable integral and simple diffusers io binding as a result
* style
* distribute onnxruntime tests
* no need to clean disk for fast tests
* move diffusion tests to diffusion
* fix
* test
* providers
* fix
* fix and get rid of io bindign helpers
* get rid of the models folder in onnxruntime
* style
* comments
* remove _from_transformers
* fix trust remote code
* fix local model test (use_cache only should be pased to causal lm)
* reuse input buffers as output buffers in diffusion models
* fix sess_option
* override libarary for sentence transformers feature etraction
* fix no cross attention
* remove lib and default to transformers
* better
* flaky decoder
* add saving session utils
* test
* more refactoring attempt (seperating encoder/decode parts deom parent model)
* remove print
* fix
* _infer_onnx_filename as private method
* fix model name
* fix more model paths
* fixes
* Update optimum/onnxruntime/base.py
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
* added main export library guards, and restricted when to force eager
* style
* review suggestions and added warning when properties are not consistent across
* ORTParentMixin
* deprecate instantiating ORTModel, ORTModelForCausalLM and ORTModelForConditionalGeneration with positional arguments.
* keyword arguments
* diffusion
* known output buffers
* style
* typo
* id
* fix
* more typos
* Update optimum/onnxruntime/modeling_diffusion.py
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>
* remove task argument from _export
* deprecate and fix
* fix
* style
* slim later
* misc fixes and extensions in testing
* allow passing export arguments to diffusion pipeline (e.g. exports on cuda device, with specific dtype, etc)
---------
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>