Fix sliding window attention used in Gemma2FlashAttention2 #32522
fix sliding window attention (flash2) in gemma2 model
13cb6c08
[run-slow] gemma
e81fc78e
fix slicing attention_mask for flash_attn2
f1adb8a7
fix slicing attention_mask when flash_attn is used
f912212d
Merge branch 'main' into fixing-sliding-window-attn
42f5d0e1
add missing comment
d3ae866a
slice the last seq_len tokens in the key, value states
73acbc18
revert code of slicing key, value states
991534ee
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub