PR #37307 Add llama4 - SemanticDiff

remove one of the last deps

9a75c63f

update fast image processor after refactor

e3c52a2f

styling

1854fc90

more quality of life improvements

660dc8c7

Merge branch 'final-version' of github.com:huggingface/new-model-addi…

2defa9c7

nit

0cf2e771

update

693fc474

cleanups

8da4b6e8

some cleanups

ba7a8aad

vllm updates

db2821e6

update fake image token

6c04e10c

[convert] Fix typo

5e9d84f3

[convert] Strip extraneous bytes from shards

aa595de6

[convert] Minor fixes

507857d7

[convert] Use num_experts

d9e3f86a

multi-image fixes in modeling + processor

5bebf978

fixup size

671c37bd

128 experts

972c465e

Use default rope

1be3ddc3

Merge branch 'final-version' into fixes_cleanups

347a7620

Unfuse mlp

b06a26b7

simplify a lot inputs embeds merging

52787d5c

Merge branch 'fixes_cleanups' of github.com:huggingface/new-model-add…

9c0ef18c

remove .item() :eyes:

03e99395

fix from review

ddf7adc2

Merge pull request #5 from huggingface/fixes_cleanups

82004d95

Merge branch 'final-version' into moe-128

ca0cd0ea

Address feedback

54be1a01

Use None "default" for rope_scaling. Add eot.

b38318d1

set seed

ed00fb30

Merge branch 'main' of github.com:huggingface/new-model-addition-meta…

b5373e20

return aspect ratios and bug fixes

fb748af7

Moe 128 rebased (#8)

189a1032

un-comment write_tokenizer from converting script

24d4599a

remove un-used imports

73520342

[llama4] Pop aspect_ratios from image processor output in Llama4Proce…

ca64ae50

Merge pull request #11 from huggingface/remove-aspect-ratios

3bf26c26

Merge remote-tracking branch 'origin/final-version' into moe-128

4a1fec8d

Fix parameter_count name

4af4c778

Update src/transformers/models/llama4/configuration_llama4.py

b077bb5b

Merge pull request #4 from huggingface/moe-128

c487c62d

nit

55a17c58

Merge branch 'final-version' of github.com:huggingface/new-model-addi…

90d58762

Add changes for no_rope, moe_layers, chunked attention. Just need to …

e53363d1

Update src/transformers/models/llama4/image_processing_llama4_fast.py

5b8dd838

Merge pull request #13 from huggingface/meta_vllm

87abef5a

nit

71385f16

Merge branch 'main' of github.com:huggingface/new-model-addition-meta…

0c3f25a5

Merge branch 'final-version' of github.com:huggingface/new-model-addi…

ec85fa38

fix post merge with main

c358a1b4

support flex attention

0c3dc0c0

Merge branch 'final-version' into norope

1f4072b3

fixes

d728d06f

fix

31d88f17

add layer

c338736b

small updates

6529cade

rebase and delete llm_compressor

558c096d

nit

72517161

[llama4/mm] Add back <|image|> token that delimits global tile

5be1b28a

Merge pull request #16 from huggingface/global-tile

6f63da62

[llama4/mm] Fix Llama 4 image processing unit tests

f4f9fbce

add explicit dtype

2ad69a48

sdpa works

0a9da1b5

Merge pull request #17 from huggingface/tests

21eb873c

Merge pull request #15 from huggingface/fix_quantization

4047e865

comment todo small

6da9409f

fix model loading

233c7df8

Merge pull request #18 from huggingface/meta/fix-model-loading

cd4a2dae

revert

fa75c349

nits

9679739d

Merge pull request #19 from huggingface/reverting_quantization_fix

eb677fa9

small fix for TP on 1 node

b61c859c

Read new params from config

822f2961

Add <|eom|>

a417896d

lol don't know how this got here

37391a3e

adding fp8

fe240a6a

Save processor, fix chat template

ef31789f

style

afcc7ec3

Add boi/eoi tokens

ce5d1ea0

fixes for now flex seems to work :)

da1e6910

updates

7a2afb3d

nits

85cf8b92

updates

ab268fb8

missking keys

f418d062

add context parallel

2133277b

update

c29469ce

update

8b0a8c9f

fix

e472a4ee

nits

2f8d05bd

add worldsize and make eager attn work for vision

196d87ed

Merge pull request #23 from huggingface/minor_tgi_fix

ef479fa1

Ignore new key present in base models

12451706

add tp_plan

ddf89936

fix nope

b98cde83

minor fix

b25084be

Merge pull request #26 from huggingface/meta/fix-nope

0f5b27ba

Clean up Llama4 vision model

99ec54bf

Merge pull request #28 from huggingface/cleanup-mllama4

0a102524

current updates

90e8e2c8

add support for `attn_temperature_tuning`

5e87ba9c

add floor scale

9e2e0f95

add missing attn scales

5b1721bb

push what works, dirty trick for the device synch

c06da80c

oups

29f55d2b

Fix pad_token_id

cf83f0b7

fix causallml loading

06413dcd

rm

ed6cba87

Merge pull request #20 from huggingface/conversion-fixes

6d564d03

fix tied-weights

ff1df035

fix sdpa

6decf844

Merge branch 'norope' of github.com:huggingface/new-model-addition-me…

ba2e4641

Merge pull request #32 from huggingface/remove-warning

4eabf8f2

push current version

7a001691

Merge branch 'norope' of github.com:huggingface/new-model-addition-me…

a820dbe5

should work with both short and long

24dbcad6

add compressed_tensos & fix fbgemm tp

f2bbb4ba

Fix flex impl

aeaad13a

style

96066e09

chunking

eb535ee0

Merge branch 'final-version' into norope

60a58cb7

try to revert the potentially breaking change

e19af4b3

fix auto factory

eb167f28

fix shapes in general

7f8941d2

rm processing

30cacf70

Merge pull request #30 from huggingface/fix-causal-lm-loading

99f2297e

commit cache utils cleanup

7990c78f

Fix context length

c7d4c883

fix

efb45772

Merge branch 'final-version' into add_fbgemm

9f9974b1

allocate

174eda3c

update tp_plan

bdfb5731

Merge pull request #21 from huggingface/add_fbgemm

aa8daba2

fix SDPA!

05cc59e1

Add support for sparse `Llama4TextMoe` layer from the kernel hub

dcb29eb8

cleanup

61626d0d

better merge

373a472e

Merge branch 'norope' of github.com:huggingface/new-model-addition-me…

d7d09a17

update

64c2133d

still broken fixing now

85b3c7ac

nits

bfc8049c

revert print

5da08327

Write max_position_embeddings and max_model_length

bc44b2be

Update modeling_llama4.py

1a762675

Save attention_chunk_size

fd0f2733

Sync eos terminators

f03660ad

Read initializer_range

3612b9cc

style

f7818858

remove `dict`

206c8aea

fix

51f7cd24

eager should use `chunked_attention_mask`

cb58ceac

revert

74142354

fixup

04b302a7

Merge pull request #14 from huggingface/norope

a515579a

Merge branch 'final-version' of github.com:huggingface/new-model-addi…

a9045fc9

Merge pull request #36 from huggingface/sparse-llama4-moe

ccda19f0

Merge branch 'final-version' into fix-context-length

598dded8

Merge pull request #35 from huggingface/fix-context-length

ec7656a4

fix config

fcee23da

Revert "Merge pull request #36 from huggingface/sparse-llama4-moe"

6ca6f66c

Fix typo and remove warning with compiled flex and chunked prefill

535030a0

Fix MoE vs FF (#41)

a43e0561

fix

f5dd6fb7

Use correct no_rope_layers if provided one is empty list

7c03c7e0

Merge pull request #46 from huggingface/keep-nrope-layers-fix

6a8b9f62

update tests

7bda11f2

fix

e547b10b

skipping some tests

0130b2df

fix fp8 loading

93022de7

fix text geneartion pipeline

45cf5828

eager needs 4D mask

a3e8267d

fix

6ab06825

Merge pull request #50 from huggingface/fix-eager

fd150bb7

Some cleanup

ef8dbe2b

fix

c38bf3a8

update

141da657

fix

66c36a47

replace correctly module

9b2e35df

patch

ce91d95e

modulelist

2374ff71

update

61f45af6

update

a471b104

clean up

4c4bc81c

Don't move to `cuda:0` in distributed mode

f642d32c

restrict to compressed tensors for now

3d58f8e1

rm print

8dbf7cb9

Docs!

48b4f563

Fixes

46b08156

Update docs/source/en/model_doc/llama4.md

0849d322

Fixes

f7756b4e

cuda graph fix

27364daf

Merge pull request #38 from huggingface/smol-fix

b239675a

Merge pull request #49 from huggingface/fix-quantization

aeec2dce

Merge pull request #53 from huggingface/l4-docs

8578252f

Merge branch 'final-version' of github.com:huggingface/new-model-addi…

eb9e4afb

revert some stuff

ad839d3c

fixup

9f03f059

styling

83282a19

Merge pull request #44 from huggingface/fix_style

fb495fd9

Merge pull request #54 from huggingface/fix-tp-pipeline

29028393

Update src/transformers/models/llama4/modeling_llama4.py

3eab4436

Merge branch 'final-version' into code-quality

688dc5cf

fixup

695c1e7f

Merge branch 'code-quality' of github.com:huggingface/new-model-addit…

54785ef2

commit licence, cleanup here and there and style

26b56748

more styling changes

c53e2595

Merge pull request #51 from huggingface/code-quality

f87c2378

Merge pull request #55 from huggingface/tgi_cuda_graph_fix

7d5d5f0d

fix dummies

1895d02c

Merge branch 'final-version' of github.com:huggingface/new-model-addi…

931dad92

fix and clean docstrings

ed669a34

remove comment

7f292e1f

Merge branch 'main' of github.com:huggingface/new-model-addition-meta…

b97451ea

remove warning

34f6e9ef

Only fast image processor is supported

bac11b51

nit

d73aea8c

trigger CI

ab8bbadc

fix issue with flex encoder

6c6e9014

Merge branch 'final-version' of github.com:huggingface/new-model-addi…

4994729f

Merge pull request #58 from huggingface/only-fast-image-processor

5b96e5d2

fix dynamic cache

5ce5746b

Merge branch 'final-version' of github.com:huggingface/new-model-addi…

555c4eeb

Code quality

6ba8ef7f

Code quality

ecaa1a7b

fix more tests for now

0c8624b2

Code quality

8167ac4c

Code quality

71521afb

Nuke bunch of failing stuff

949b1b7e

Merge branch 'final-version' of github.com:huggingface/new-model-addi…

b8786474

Code quality

cbb6e599

Code quality

8c509348

cleanup removal of slow image processor

44a90c0f

ruff fix fast image processor

99b6bc8f

fix

7c471ea7

fix styling

538ba2b0

git push Merge branch 'final-version' of github.com:huggingface/new-m…

50a8daab

github-actions marked this pull request as draft 1 year ago

ArthurZucker marked this pull request as ready for review 1 year ago

LysandreJik approved these changes on 2025-04-05

ArthurZucker added New model

ArthurZucker added Tensor Parallel

Docs

07eaf8cc

Repo consistency

8b39d94f

Repo consistency

3736b900

fix sliding window issue

92746533

git push Merge branch 'add-llama4' of github.com:huggingface/transfor…

22a33e3f

separate llama cache

748d6221

styling

6a777c0b

Repo consistency

457f3c6a

Repo consistency

1226014c

push waht works

ac54e8ff

Merge branch 'add-llama4' of github.com:huggingface/transformers into…

69e94706

L4 Repo consistency

8f08b701

Docs

e9769f02

fix last last alst alst alst alstsaltlsltlaslt

2ec5fbe4

Merge branch 'add-llama4' of github.com:huggingface/transformers into…

9bfae248

ArthurZucker merged 25b7f272 into main 1 year ago

ArthurZucker deleted the add-llama4 branch 1 year ago

YenFuLin commented on 2025-05-14

quantLm14 commented on 2025-06-02

transformers
Add llama4
#37307

Merged

Add llama4 #37307

transformers Add llama4 #37307 Merged

Add llama4 #37307

transformers
Add llama4
#37307

Merged