optimum
2eab7abc - Fix bloom KV cache usage in ORTForCausalLM (#1152)

Commit
2 years ago
Fix bloom KV cache usage in ORTForCausalLM (#1152) * fix bloom pkv usage with num_beams>1 * Update optimum/onnxruntime/modeling_decoder.py Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com> * Update optimum/onnxruntime/modeling_decoder.py Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com> * Update optimum/onnxruntime/modeling_decoder.py Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com> * remove transformers import --------- Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Author
Parents
Loading