llama.cpp
llama : custom attention mask + parallel decoding + no context swaps
#3228
Merged

Loading