vllm
d48f4d6d - perf: Avoid copying inputs_embeds tensors to GPU unless prompt_embeds is enabled (#25739)

Commit
217 days ago
perf: Avoid copying inputs_embeds tensors to GPU unless prompt_embeds is enabled (#25739) Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Parents
Loading