llama.cpp
80ea089d - llama : allow pooled embeddings on any model (#7477)

Commit
1 year ago
llama : allow pooled embeddings on any model (#7477) * create append_pooling operation; allow to specify attention_type; add last token pooling; update examples * find result_norm/result_embd tensors properly; update output allocation logic * only use embd output for pooling_type NONE * get rid of old causal_attn accessor * take out attention_type; add in llama_set_embeddings * bypass logits when doing non-NONE pooling
Author
Parents
Loading