llama.cpp
212f4521 - context : use n_embd_out for pooled embedding extraction (#20840)

Commit

18 days ago

context : use n_embd_out for pooled embedding extraction (#20840) The MEAN/CLS/LAST pooling paths in encode() and decode() used n_embd_inp() (16384 for qwen3vl with deepstack) to read from the pooled embedding tensor, which only has n_embd_out() (4096) floats per sequence. This caused a tensor read out of bounds assertion. Fixes embedding mode for Qwen3-VL-Embedding models.

References

#20840 - context : use n_embd_out for pooled embedding extraction

Author

extfs

Parents

568aec82

llama.cpp 212f4521 - context : use n_embd_out for pooled embedding extraction (#20840)

llama.cpp
212f4521 - context : use n_embd_out for pooled embedding extraction (#20840)