transformers
f842abac - Fix tensor indexing crash in serve generate_response KV cache continuation (#44735)

Commit

61 days ago

Fix tensor indexing crash in serve generate_response KV cache continuation (#44735) The `generate_response` method indexes `inputs` as a dict (`inputs["input_ids"]`) but `inputs` is already the raw `input_ids` tensor at that point. This causes a TypeError on the second request in a conversation session when KV cache reuse is attempted. Use `inputs.shape[-1]` instead, matching `generate_response_non_streaming`. Fixes #44734 Co-authored-by: easonysliu <easonysliu@tencent.com> Co-authored-by: Lysandre Debut <hi@lysand.re>

References

#44735 - Fix tensor indexing crash in serve generate_response KV cache continuation

Author

mango766

Parents

39f751a5

transformers f842abac - Fix tensor indexing crash in serve generate_response KV cache continuation (#44735)

transformers
f842abac - Fix tensor indexing crash in serve generate_response KV cache continuation (#44735)