Fix bloom KV cache usage in ORTForCausalLM (#1152)
* fix bloom pkv usage with num_beams>1
* Update optimum/onnxruntime/modeling_decoder.py
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Update optimum/onnxruntime/modeling_decoder.py
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Update optimum/onnxruntime/modeling_decoder.py
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* remove transformers import
---------
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>