PR #58 Add EoMT DINOv3 model

NielsRogge added codex

NielsRogge force pushed from 1f098ffa to 9e24af70 86 days ago

Fix Qwen3OmniMoE weight init (#42531)

dac2ad76

small fix tokenizer regex patch (#42528)

83fe012d

[TP plans] Fix some incorrects TP plans (#42448)

6e3f2f8f

[Ministral 3] Add ministral 3 (#42498)

bf3f0ae7

Fix ernie moe (#42535)

0fa49db1

Add FastVLM (#41112)

a6497675

Fix Qwen-VL family with prompt tuning (#42508)

57eeb9ca

Fix failing test in Glm4vMoeIntegrationTest (#42488)

ac0769cd

[Quantization] fix dequant when block size is none & static quantizat…

bb09a30f

[Ministral 3] Small fix config (#42537)

64d8cf4f

[Fix] dots1 expert bias routing (#41663)

29e8522b

Fix fp8 + some enhancement (#42455)

bc7a268f

[test] delete `SeamlessM4TProcessorTest::test_save_load_pretrained_ad…

4ec83fe9

Fix eetq quanto quant methods (#42557)

5efd0d4a

Add backward compatibility for methods which have been moved to `Rota…

0ba8f001

Fix `parse_response` after tokenizer refactor (#42300)

5690f24e

fix regression (#42569)

53d2bf6d

Kernel mapping error resolve (#42466)

3f174109

Transformers serve fix (#42570)

52b988d8

[SAM3] Compute masks once instead of per-layer, fix fa2 crash (#42543)

80b408d1

Allow fallback to loading from Auto"SubProcessor".from_pretrained whe…

675e8763

[`CI`] Fix copies (#42571)

a5c061d2

[Quantization] per tensor quantization kernel (#42560)

51c5a7a6

make sure the FSDP plugin args are appropriately cast to bools (#42566)

377a8ee7

[Quantization] fix fbgemm (#42561)

15b79ea8

Use `getattr` in `standardize_rope_params` because `rope_parameters` …

232ecf2c

[XPU] Fix fp8 UT patch func (#42584)

75c135d7

Fix loaded data order bug when resuming from epoch >= 1 (#40691)

629c0da4

fix the FSDP1 default for reshard_after_forward (#42578)

cba7ae86

fix: correct typos in code comments (#42577)

8ba286ba

fix : cast into floats AFTER all assignments (#42587)

130dc471

Fix mixed torch.Tensor and DTensor in generate when use FSDP2 + LoRA …

17c7c496

Fix the FA2 logic in the longcat_flash model (#42549)

c0328af6

[Quantization] Remove dequant fp8 config (#42596)

7266f50b

Add torch compile to CB (#42516)

ef780bf1

Allow `validation_fn` to be `None` in `validate_rope` (#42601)

b8d5018e

Add SDPA support for PatchTST model (#42465)

dd6cfdd3

Align tie weights in Idefics (#42551)

81aabe72

repo. consistency bot (#42575)

d5d87934

Fix Ernie Moe Test (#42595)

9e82c779

Fix some models cache initialization (#42586)

a48d68c6

extend FA2 and other cases to XPU, (#42536)

2e93004a

Update revision so that there is a safetensors model (#42618)

3cdccba0

Every model forward() should have **kwargs (#42603)

9b74e4c4

fix(Qwen3VLCausalLMOutputWithPast): missing `hidden_states` and `atte…

0c3d043e

[core] Fix quark (#42457)

bebfab06

Fix small weight loading example (#42622)

552409e5

Fix _is_package_available to handle underscore/hyphen equivalence (#4…

bf8b9e7b

Fix typo in docstring in modeling_sam3_tracker.py (#42438)

a3e2d547

[V5] Return a BatchEncoding dict from apply_chat_template by default …

ce53cc00

more tts pipeline exampel (#42484)

f6dcac65

Adapt some test case on npu (#42335)

b0831697

feat(trainer): Just-in-time (JIT) asynchronous checkpointing using SI…

fda2d735

mark params as _is_hf_initialized with DS Zero3 from weight conversio…

e920f94b

[loading] Allow loading to happen without threading (#42619)

280c5d6d

Remove splitted_tests.txt file (#42625)

4c9fde2a

Fix interactions between require_read_token and staticmethod (#42522)

3a8d291a

Fix FSDP bnb error (#42600)

91865a69

Move max_new_tokens recommendation into GenerationConfig docstring (#…

f8e69286

Tiny Clean up `_deps` in setup.py (#42607)

2c298389

[torchao] safetensors (#42529)

328396d9

Fixed convert_batch_to_list_format staticmethod function call (#42476)

390dca67

regression from tokenizers v5 to fix fast reference for pipeline (#42…

35f32e94

Better security for `pr-repo-consistency-bot.yml` (#42646)

ee7e67bf

test ci training for text model only (#42597)

afa43c73

Ultra security for `pr-repo-consistency-bot.yml` (#42652)

75ae02f0

Fix a typo in GGML integration of Qwen2 MoE (#42650)

366de9a6

Offloading need to add the prefix into the offload_index (#42624)

20890e3b

Fix saving multiple tokenizers for custom processors (#42630)

e5aad213

Compress (#42643)

28906c3c

[kernels] fix typing for Kernel mapping (#42623)

626875b6

small cleaning of quantization class (#42633)

01267073

Fixing typo in documentation (philosophy) (#42647)

4ad279fb

[docs] TP blog post (#42637)

fccb0499

[loading] Correctly load params during offloading & careful memory co…

1d86d00e

[docs] Attention backends + continuous batching (#42329)

e3673ed4

Lasr model (#42648)

ff13eb66

Improve SSH into runner (#42695)

8d75aabf

update and add Expectations for mistral3/internvl tests (#42616)

81b84175

[Quantization] Fix FP8 experts replacing (#42654)

ca1698ef

Use hfh's is_offline_mode helper (#42657)

ba1ad535

Let transformers know when a model is being traced via jax.jit (torch…

5ee9ffe3

[`mRope`] Fix warnings (#42660)

e3ceeafd

CircleCI failed test summary (#42240)

e636ea2b

Remove Neptune integration references and deprecate `NeptuneCallback`…

8eef4bbf

FIX Error when trying to load non-LoRA PEFT (#42663)

d3ee06b8

Fixed paged|FA2 kernel loading logic and UT. (#42547)

75beab1c

Fix PEFT integration with new weight loader (#42701)

142ae3d9

Remove duplicated imports (#42689)

9e888145

update gemma3 exepectations and switch to dynamic cache (#42688)

ad541045

Fixed failing `BioGPT` batch generation test (#42677)

e8e142de

Fix failing `ColPaliModelIntegrationTest` (#42705)

0e0af808

Fixed failing Bart-Model Integration Tests (#42676)

b3565823

Fixed failing batch_generation test for `opt` model (#42693)

0e51e7a2

hotfix for circleci summary workflow (#42708)

2db992d8

fix tp (#42704)

6564633b

Raise error when missing or incorrect dates (#42610)

8fe97d90

Fix indentation in EoMT fast image processor (#42707)

745ad8c7

Delete previous comments of `View the CircleCI Test Summary` (#42725)

799103bf

Fix tests quantization (#42703)

5ac6284b

[kernels] make the module declaration implicit with decorator (#42700)

c1ac1825

Fix failing `owlv2` image processor integration test (#42714)

ec37fc88

Update replace_with_ for quants methods to not use recursion (#42711)

85ced0f9

Fix failing `CodeGenModelTests` (#42730)

9a6df2ce

Refactor-tokenization-more (#42563)

73a13f86

🚨🚨 [saving] Default to 50GB shards, and remove non-safe serialization…

3f3cae74

Add an alternative scenario to EoMT `post_process_semantic_segmentati…

5b4d72c5

fix links in `CONTRIBUTING.md` (#42745)

471d7ce9

Only default `rope_parameters` to empty `dict` if there is something …

3230fb50

Only call `torch.autocast` if it will have an effect (#42747)

6d0adb5b

[Quantization] Fixing some tests (#42763)

8f978e5b

Ensure e_score_correction_bias dtype of DeepSeek-V3/R1 is FP32 (#42580)

2e29a9a6

[kernels] Fix kernel CI (#42764)

508a9764

Stricter checks for mistral patch (#42743)

b9951b4e

Command-a-vision fix (#42642)

1b8ccf1c

fix: support tensor labels in DataCollatorWithFlattening (#42620)

f54647c8

Override Transformers defaults by GGUF defaults (#42770)

51a66739

[Quantization] Fix Static FP8 Quantization (#42775)

15735a43

[core] fix fp-quant (#42613)

c3acdd57

Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel (#41567)

8ebfd84f

allow registration of custom checkpoint conversion mappings (#42634)

45d8168e

feat(granitemoe*): Remove logits upcast when computing loss (#42753)

0af2381f

🚨🚨🚨🚨🚨🚨🚨🚨🚨 default to `"auto"` dtype (#34919)

f5aa90d0

[`Padding-Free Attention`] Fix packed FA attention with pos ids only …

d1eda63f

Revert "🚨🚨🚨🚨🚨🚨🚨🚨🚨 default to `"auto"` dtype (#34919)"

a8f32a0e

[Quantization] FBgemm FP8 for XPU (#42773)

86644be4

Better continuous batching tests (#42699)

f8e5ae6a

fix awq (#42776)

f8e8ddb0

[CI] fix wav2vec test (#42810)

78b29929

[Model] Add PaddleOCR-VL Model Support (#42178)

8c84144b

Vision docs 📝 (#42096)

eaa3d4dd

[kernels] Final kernel removal 🥳 (#41664)

de055d6d

Fix integration test in Owlv2 image processing tests (#42783)

dfe6e4c0

[kernels] adding RMSNorm kernel for mps devices (#42058)

0c18820c

update deprecation msg for `warmup_ratio` (#42813)

6a93635e

Raise conversion errors after loading (#42807)

464dfa04

Automatic release script (#42808)

8a2a83d5

Default auto (#42805)

6217adc6

[docs] Chat content patterns (#42748)

6d00f6b0

[XPU] Fix UT errors in the sam3 and lfm series model. (#42798)

f80b0485

Add requires_backends to the main init (#42799)

3fbd59b6

Fix deepspeed sp loss due to missing labels (#42812)

780cc659

Compatible with GPTQModel FORAMT.LLM_AWQ (#42833)

b19844ee

Fix xpu output check for Ministral3 tests (#42761)

c24b51dd

Fixes for the failures of AMD CI (#42718)

aa495f62

Fix a typo in MoE models (#42835)

37426b27

Simplify dtype instantiation (#42825)

64a7cc82

Add inputs_to_logits_ratio to LasrCTCConfig (#42720)

65dc2615

[saving] Simplify general logic (#42766)

e6b9d061

[`T5Gemma2`] Fix bidirectional mask for encoder (#42820)

33c948e4

Do not rely on config for inferring model dtype (#42838)

5b710c75

Improve BatchFeature: stack list and lists of torch tensors (#42750)

a61aba59

Reapply modular examples (#42846)

c2470630

Fix Gemma (#42847)

40dc11cd

[Fix] Fix FA2 kernels ut (#42803)

e17b1b85

Fix speccht5_tts pipeline (#42830)

66623a1f

Fixes 2 failing tests from AMD CI (#42777)

f0d9cd1f

[docs] Improve contribution guidelines for Quantization (#42870)

64c12fdf

Remove tied weights from internal attribute if they are not tied (#42…

a187b857

typo (#42863)

298d08dc

[CB] Easy optimizations for continuous batching (#42839)

f3d5f255

Enforce call to `post_init` and fix all of them (#42873)

c7aec088

Remove null values from fast image processors dict (#42780)

fc50bdc6

fix: Initialize ApertusMLP's xielu activation using `torch_dtype` (#4…

06378d40

Simplify using custom resolution for sam3 and sam3_video inference (#…

23394cc4

[docs] optimizations quickstart (#42538)

31de95ef

Add `.on_push_begin()` callback to Trainer and implement for `Trackio…

7f52a2a4

Fix BLT training_ci overfit test (#42685)

0f97c688

Add missing ModelOutput subclass return type hints (#41219)

6c7c992f

[Devstral] Make sure FP8 conversion works correctly (#42715)

7960b5ea

[modular] Fix a weird renaming edge-case (#42844)

8d526c23

Stop collecting all model parameters to save models when using DeepSp…

89998bdd

Fix convert_tekken_tokenizer (#42592)

252afd89

[`Ernie 4.5 Moe`] Fix routing, weights, and update expectations (#42653)

4e7cecb2

Fix GraniteMoeHybrid in transformers v5 (#42872)

5d2f82b5

Added kernels from kernel hub for Bamba model (#41540)

0f896619

fix FastSpeech2ConformerTokenizer crash in tokenize (#42888)

24b311ee

Simplify tie weights logic (#42895)

4d6516e2

Add local kernel loading support to KernelConfig(). (#42800)

24275124

Remove duplicated processor class from config (#42806)

b61da251

fix: typehits for Causal LM models (#42885)

4c64a8fb

refactor more tokenizers - v5 guide update (#42768)

6994c5ac

fix `Dtensor` and `tensor` mismatch (#42906)

b1a2fba1

Sam: Perception Encoder Audiovisual (#42905)

9aef5ca4

Fix add_dates script: Fetch github repo from url to check if model is…

703da867

Support having multiple sub-processors (of any kind) in the same proc…

dd24a806

Rewrite for loop in get_image_features with torch ops for export (#42…

a33ef4f9

adds jais2 model support (#42684)

0dbf8085

Overwrite `get_decoder()` in AudioLLMs (#42896)

1dc69bd6

Preprocessing fixes and more tests for LFM2-VL (#42784)

558666f2

[`Tokenizers`] Change treatment of special tokens (#42903)

ade62c2a

[`Auto`] Make processor subclasses overridable on load time (#42912)

79432f7a

Qwen2/3 MoE + GGUF model support (restored) (#42854)

c67ec2c4

Fix: Pass local_files_only from pipeline() to model loading (#42318)

0218f1a5

Fix cuda index (#42897)

05c7e4a4

Add Pixio pre-trained models (#42795)

a05e0e27

Remove tied weight keys Sam2Video (#42840)

171e079e

fix Dtensor and tensor mismatch for Col/RowRep (#42924)

99be81e7

[kernels] Add user_agent to track kernels metrics (#41689)

0001b3ee

Fix dtype quantizer (#42882)

b05d2c43

Make gradient-checkpoint enabling tolerant of models without get_inpu…

b712a97d

Remove ipex/ccl in CPU training doc (#42866)

f404f150

docs: Squared ReLU paper fix (#42931)

2f9e21f5

[docs] WeightConverter (#42636)

1aab1e9c

[docs] Expert parallelism (#42409)

9e3568e0

[docs] Update shard size (#42749)

12fe95f8

[docs] optimization cleanup (#42827)

5ef16edd

Improve BatchFeature (.to() works on lists/nested lists of tensors, a…

9f583b1b

Document new default shard size + dropped unsafe serialization (#42904)

bdaddb6f

🚨 Generation config defaults are now `None` (#42702)

a81e04a9

[Quantization] rm _pre_quantization_dtype from quantization tests (#4…

d3d4b629

[Quantization] Misc tests fixes (#42940)

d7dd443a

[CB] Allow block sharing in hybrid models (#42877)

04e78e67

[Tests] Fix CompressedTensors tests (#42935)

0a846542

Update `param_element_size` (#42818)

dd8057af

rewrite _process_parameter_type in auto_docstring.py to improve usabi…

728f34c3

Add buffers to `_init_weights` for ALL models (#42309)

537c2e3d

Fp8 dq (#42926)

af91c0ba

[Quantization] Removing misleading int8 quantization in Finegrained F…

4dc60fd3

fix(tvp): add missing type_vocab_size parameter to TvpConfig (#42928)

b62e5b3e

🚨 Fix ConvNeXt image processor default interpolation to BICUBIC (#42934)

60634caa

Load generation config from nested configs (#42922)

f2c6d2ad

[docs] dtype (#42883)

b93f2e3a

Updated `backbone_config` docstrings and type annotations (#42927)

3e4baf8e

[Quantization] CI green by end of year (#42951)

b5eea347

[kernels] Fix failling tests (#42953)

cf0f071e

fix concat order (#42946)

a4d62291

fix error: 'BlockMask' object has no attribute 'dtype' for lasr model…

789226c1

[loading] Really initialize on meta device for huge perf gains (#42941)

bb9357ff

Add runner specification to CodeQL workflow (#42955)

d54d78f7

Fix infinity in JSON serialized files (#42959)

d14d99ea

[`Generation`] Fix default overwrite for non-`None` defaults (#42958)

f218ed21

[`Ernie 4.5`] Ernie VL models (#39585)

a8a22624

Fix tests trainer again (#42933)

70179949

Single config attribute for weight tying (#42815)

9a90500b

fix device dismatch issue for pe_audio_video model parallelism (#42917)

1b280f99

[`Tests`] Fix inputs placement (#42963)

007274db

Do not use global variable, and improve context manager coverage (#42…

0d2dbaa9

Hardcode the factor in caching allocator (#42996)

f5d9d808

Fix formatting of trackio model tag (#42973)

817886a6

Fix merge

eb982843

Fix DocQA max_answer_len validation error message (#42948)

64ec2ba1

Fix incorrect library name in BitNet integration warning (#42966)

f7d139c1

Improve spacing of markdown files (#42984)

e9f0f8e0

[loading][TP] Fix device placement at loading-time, and simplify shar…

5f1c05cf

Fix deepspeed + quantization (#43006)

42512f79

Fix merge, make fixup

119f2fd9

Do not return a tuple in mistral tokenizer Automapping (#42997)

9971e410

Merge remote-tracking branch 'upstream/main' into codex/integrate-eom…

d997f7e3

Fix tests

667d1304

transformers
Add EoMT DINOv3 model
#58

Open

Add EoMT DINOv3 model #58

transformers Add EoMT DINOv3 model #58 Open

Add EoMT DINOv3 model #58

transformers
Add EoMT DINOv3 model
#58

Open