transformers
Add llama4
#37307
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
254
Changes
View On
GitHub
Add llama4
#37307
ArthurZucker
merged 254 commits into
main
from
add-llama4
remove one of the last deps
9a75c63f
update fast image processor after refactor
e3c52a2f
styling
1854fc90
more quality of life improvements
660dc8c7
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
2defa9c7
nit
0cf2e771
update
693fc474
cleanups
8da4b6e8
some cleanups
ba7a8aad
vllm updates
db2821e6
update fake image token
6c04e10c
[convert] Fix typo
5e9d84f3
[convert] Strip extraneous bytes from shards
aa595de6
[convert] Minor fixes
507857d7
[convert] Use num_experts
d9e3f86a
multi-image fixes in modeling + processor
5bebf978
fixup size
671c37bd
128 experts
972c465e
Use default rope
1be3ddc3
Merge branch 'final-version' into fixes_cleanups
347a7620
Unfuse mlp
b06a26b7
simplify a lot inputs embeds merging
52787d5c
Merge branch 'fixes_cleanups' of github.com:huggingface/new-model-add…
9c0ef18c
remove .item() :eyes:
03e99395
fix from review
ddf7adc2
Merge pull request #5 from huggingface/fixes_cleanups
82004d95
Merge branch 'final-version' into moe-128
ca0cd0ea
Address feedback
54be1a01
Use None "default" for rope_scaling. Add eot.
b38318d1
set seed
ed00fb30
Merge branch 'main' of github.com:huggingface/new-model-addition-meta…
b5373e20
return aspect ratios and bug fixes
fb748af7
Moe 128 rebased (#8)
189a1032
un-comment write_tokenizer from converting script
24d4599a
remove un-used imports
73520342
[llama4] Pop aspect_ratios from image processor output in Llama4Proce…
ca64ae50
Merge pull request #11 from huggingface/remove-aspect-ratios
3bf26c26
Merge remote-tracking branch 'origin/final-version' into moe-128
4a1fec8d
Fix parameter_count name
4af4c778
Update src/transformers/models/llama4/configuration_llama4.py
b077bb5b
Merge pull request #4 from huggingface/moe-128
c487c62d
nit
55a17c58
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
90d58762
Add changes for no_rope, moe_layers, chunked attention. Just need to …
e53363d1
Update src/transformers/models/llama4/image_processing_llama4_fast.py
5b8dd838
Merge pull request #13 from huggingface/meta_vllm
87abef5a
nit
71385f16
Merge branch 'main' of github.com:huggingface/new-model-addition-meta…
0c3f25a5
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
ec85fa38
fix post merge with main
c358a1b4
support flex attention
0c3dc0c0
Merge branch 'final-version' into norope
1f4072b3
fixes
d728d06f
fix
31d88f17
add layer
c338736b
small updates
6529cade
rebase and delete llm_compressor
558c096d
nit
72517161
[llama4/mm] Add back <|image|> token that delimits global tile
5be1b28a
Merge pull request #16 from huggingface/global-tile
6f63da62
[llama4/mm] Fix Llama 4 image processing unit tests
f4f9fbce
add explicit dtype
2ad69a48
sdpa works
0a9da1b5
Merge pull request #17 from huggingface/tests
21eb873c
Merge pull request #15 from huggingface/fix_quantization
4047e865
comment todo small
6da9409f
fix model loading
233c7df8
Merge pull request #18 from huggingface/meta/fix-model-loading
cd4a2dae
revert
fa75c349
nits
9679739d
Merge pull request #19 from huggingface/reverting_quantization_fix
eb677fa9
small fix for TP on 1 node
b61c859c
Read new params from config
822f2961
Add <|eom|>
a417896d
lol don't know how this got here
37391a3e
adding fp8
fe240a6a
Save processor, fix chat template
ef31789f
style
afcc7ec3
Add boi/eoi tokens
ce5d1ea0
fixes for now flex seems to work :)
da1e6910
updates
7a2afb3d
nits
85cf8b92
updates
ab268fb8
missking keys
f418d062
add context parallel
2133277b
update
c29469ce
update
8b0a8c9f
fix
e472a4ee
nits
2f8d05bd
add worldsize and make eager attn work for vision
196d87ed
Merge pull request #23 from huggingface/minor_tgi_fix
ef479fa1
Ignore new key present in base models
12451706
add tp_plan
ddf89936
fix nope
b98cde83
minor fix
b25084be
Merge pull request #26 from huggingface/meta/fix-nope
0f5b27ba
Clean up Llama4 vision model
99ec54bf
Merge pull request #28 from huggingface/cleanup-mllama4
0a102524
current updates
90e8e2c8
add support for `attn_temperature_tuning`
5e87ba9c
add floor scale
9e2e0f95
add missing attn scales
5b1721bb
push what works, dirty trick for the device synch
c06da80c
oups
29f55d2b
Fix pad_token_id
cf83f0b7
fix causallml loading
06413dcd
rm
ed6cba87
Merge pull request #20 from huggingface/conversion-fixes
6d564d03
fix tied-weights
ff1df035
fix sdpa
6decf844
Merge branch 'norope' of github.com:huggingface/new-model-addition-me…
ba2e4641
Merge pull request #32 from huggingface/remove-warning
4eabf8f2
push current version
7a001691
Merge branch 'norope' of github.com:huggingface/new-model-addition-me…
a820dbe5
should work with both short and long
24dbcad6
add compressed_tensos & fix fbgemm tp
f2bbb4ba
Fix flex impl
aeaad13a
style
96066e09
chunking
eb535ee0
Merge branch 'final-version' into norope
60a58cb7
try to revert the potentially breaking change
e19af4b3
fix auto factory
eb167f28
fix shapes in general
7f8941d2
rm processing
30cacf70
Merge pull request #30 from huggingface/fix-causal-lm-loading
99f2297e
commit cache utils cleanup
7990c78f
Fix context length
c7d4c883
fix
efb45772
Merge branch 'final-version' into add_fbgemm
9f9974b1
allocate
174eda3c
update tp_plan
bdfb5731
Merge pull request #21 from huggingface/add_fbgemm
aa8daba2
fix SDPA!
05cc59e1
Add support for sparse `Llama4TextMoe` layer from the kernel hub
dcb29eb8
cleanup
61626d0d
better merge
373a472e
Merge branch 'norope' of github.com:huggingface/new-model-addition-me…
d7d09a17
update
64c2133d
still broken fixing now
85b3c7ac
nits
bfc8049c
revert print
5da08327
Write max_position_embeddings and max_model_length
bc44b2be
Update modeling_llama4.py
1a762675
Save attention_chunk_size
fd0f2733
Sync eos terminators
f03660ad
Read initializer_range
3612b9cc
style
f7818858
remove `dict`
206c8aea
fix
51f7cd24
eager should use `chunked_attention_mask`
cb58ceac
revert
74142354
fixup
04b302a7
Merge pull request #14 from huggingface/norope
a515579a
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
a9045fc9
Merge pull request #36 from huggingface/sparse-llama4-moe
ccda19f0
Merge branch 'final-version' into fix-context-length
598dded8
Merge pull request #35 from huggingface/fix-context-length
ec7656a4
fix config
fcee23da
Revert "Merge pull request #36 from huggingface/sparse-llama4-moe"
6ca6f66c
Fix typo and remove warning with compiled flex and chunked prefill
535030a0
Fix MoE vs FF (#41)
a43e0561
fix
f5dd6fb7
Use correct no_rope_layers if provided one is empty list
7c03c7e0
Merge pull request #46 from huggingface/keep-nrope-layers-fix
6a8b9f62
update tests
7bda11f2
fix
e547b10b
skipping some tests
0130b2df
fix fp8 loading
93022de7
fix text geneartion pipeline
45cf5828
eager needs 4D mask
a3e8267d
fix
6ab06825
Merge pull request #50 from huggingface/fix-eager
fd150bb7
Some cleanup
ef8dbe2b
fix
c38bf3a8
update
141da657
fix
66c36a47
replace correctly module
9b2e35df
patch
ce91d95e
modulelist
2374ff71
update
61f45af6
update
a471b104
clean up
4c4bc81c
Don't move to `cuda:0` in distributed mode
f642d32c
restrict to compressed tensors for now
3d58f8e1
rm print
8dbf7cb9
Docs!
48b4f563
Fixes
46b08156
Update docs/source/en/model_doc/llama4.md
0849d322
Fixes
f7756b4e
cuda graph fix
27364daf
Merge pull request #38 from huggingface/smol-fix
b239675a
Merge pull request #49 from huggingface/fix-quantization
aeec2dce
Merge pull request #53 from huggingface/l4-docs
8578252f
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
eb9e4afb
revert some stuff
ad839d3c
fixup
9f03f059
styling
83282a19
Merge pull request #44 from huggingface/fix_style
fb495fd9
Merge pull request #54 from huggingface/fix-tp-pipeline
29028393
Update src/transformers/models/llama4/modeling_llama4.py
3eab4436
Merge branch 'final-version' into code-quality
688dc5cf
fixup
695c1e7f
Merge branch 'code-quality' of github.com:huggingface/new-model-addit…
54785ef2
commit licence, cleanup here and there and style
26b56748
more styling changes
c53e2595
Merge pull request #51 from huggingface/code-quality
f87c2378
Merge pull request #55 from huggingface/tgi_cuda_graph_fix
7d5d5f0d
fix dummies
1895d02c
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
931dad92
fix and clean docstrings
ed669a34
remove comment
7f292e1f
Merge branch 'main' of github.com:huggingface/new-model-addition-meta…
b97451ea
remove warning
34f6e9ef
Only fast image processor is supported
bac11b51
nit
d73aea8c
trigger CI
ab8bbadc
fix issue with flex encoder
6c6e9014
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
4994729f
Merge pull request #58 from huggingface/only-fast-image-processor
5b96e5d2
fix dynamic cache
5ce5746b
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
555c4eeb
Code quality
6ba8ef7f
Code quality
ecaa1a7b
fix more tests for now
0c8624b2
Code quality
8167ac4c
Code quality
71521afb
Nuke bunch of failing stuff
949b1b7e
Merge branch 'final-version' of github.com:huggingface/new-model-addi…
b8786474
Code quality
cbb6e599
Code quality
8c509348
cleanup removal of slow image processor
44a90c0f
ruff fix fast image processor
99b6bc8f
fix
7c471ea7
fix styling
538ba2b0
git push Merge branch 'final-version' of github.com:huggingface/new-m…
50a8daab
github-actions
marked this pull request as draft
1 year ago
ArthurZucker
marked this pull request as ready for review
1 year ago
LysandreJik
approved these changes on 2025-04-05
ArthurZucker
added
New model
ArthurZucker
added
Tensor Parallel
Docs
07eaf8cc
Repo consistency
8b39d94f
Repo consistency
3736b900
fix sliding window issue
92746533
git push Merge branch 'add-llama4' of github.com:huggingface/transfor…
22a33e3f
separate llama cache
748d6221
styling
6a777c0b
Repo consistency
457f3c6a
Repo consistency
1226014c
push waht works
ac54e8ff
Merge branch 'add-llama4' of github.com:huggingface/transformers into…
69e94706
L4 Repo consistency
8f08b701
Docs
e9769f02
fix last last alst alst alst alstsaltlsltlaslt
2ec5fbe4
Merge branch 'add-llama4' of github.com:huggingface/transformers into…
9bfae248
ArthurZucker
merged
25b7f272
into main
1 year ago
ArthurZucker
deleted the add-llama4 branch
1 year ago
YenFuLin
commented on 2025-05-14
quantLm14
commented on 2025-06-02
Login to write a write a comment.
Login via GitHub
Reviewers
LysandreJik
quantLm14
YenFuLin
Assignees
No one assigned
Labels
New model
Tensor Parallel
Milestone
No milestone
Login to write a write a comment.
Login via GitHub