Fix llama model sdpa attention forward function masking bug when output_attentions=True #30652
Fix llama model forward function with attention=True, same-length enc…
514c1c32
Fix style
6c0fa7bb
propagate fix to modeling_cohere, gemma, dbrx, and olmo (which copy t…
0d91bea9
Fix style
894c14b8
ignore unnecessary sdpa mask converter when output_attentions=True
25843087
Merge branch 'huggingface:main' into fix-llama-mask-output-attn
c7bdc95d
Merge branch 'huggingface:main' into fix-llama-mask-output-attn
8d793a38
Merge branch 'huggingface:main' into fix-llama-mask-output-attn
fc143acf
add tests checking sdpa and eager outputs match when output_attention…
3e0fada2
Split if statements in two lines
9acc1190
Merge branch 'huggingface:main' into fix-llama-mask-output-attn
08dbd4bb
Fix formatting
9b79aee3
Add fix to new jetmoe model
dd699233
Add missing output_attentions argument to jetmoe mask creation
ad4aded6
Aladoro
deleted the fix-llama-mask-output-attn branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub