transformers
f842abac - Fix tensor indexing crash in serve generate_response KV cache continuation (#44735)

Commit
61 days ago
Fix tensor indexing crash in serve generate_response KV cache continuation (#44735) The `generate_response` method indexes `inputs` as a dict (`inputs["input_ids"]`) but `inputs` is already the raw `input_ids` tensor at that point. This causes a TypeError on the second request in a conversation session when KV cache reuse is attempted. Use `inputs.shape[-1]` instead, matching `generate_response_non_streaming`. Fixes #44734 Co-authored-by: easonysliu <easonysliu@tencent.com> Co-authored-by: Lysandre Debut <hi@lysand.re>
Author
Parents
Loading