Add EoMT DINOv3 model #58

NielsRogge
NielsRogge NielsRogge added codex
NielsRogge NielsRogge force pushed from 1f098ffa to 9e24af70 82 days ago
github-actions
Rocketknight1 Fix Qwen3OmniMoE weight init (#42531)
dac2ad76
ArthurZucker small fix tokenizer regex patch (#42528)
83fe012d
Cyrilvallez [TP plans] Fix some incorrects TP plans (#42448)
6e3f2f8f
patrickvonplaten [Ministral 3] Add ministral 3 (#42498)
bf3f0ae7
vasqu Fix ernie moe (#42535)
0fa49db1
camilla-deckard Add FastVLM (#41112)
a6497675
zucchini-nlp Fix Qwen-VL family with prompt tuning (#42508)
57eeb9ca
Sai-Suraj-27 Fix failing test in Glm4vMoeIntegrationTest (#42488)
ac0769cd
MekkCyber [Quantization] fix dequant when block size is none & static quantizat…
bb09a30f
patrickvonplaten [Ministral 3] Small fix config (#42537)
64d8cf4f
fjosw [Fix] dots1 expert bias routing (#41663)
29e8522b
SunMarc Fix fp8 + some enhancement (#42455)
bc7a268f
ydshieh [test] delete `SeamlessM4TProcessorTest::test_save_load_pretrained_ad…
4ec83fe9
SunMarc Fix eetq quanto quant methods (#42557)
5efd0d4a
hmellor Add backward compatibility for methods which have been moved to `Rota…
0ba8f001
Rocketknight1 Fix `parse_response` after tokenizer refactor (#42300)
5690f24e
SunMarc fix regression (#42569)
53d2bf6d
Aaraviitkgp Kernel mapping error resolve (#42466)
3f174109
LysandreJik Transformers serve fix (#42570)
52b988d8
yonigozlan [SAM3] Compute masks once instead of per-layer, fix fa2 crash (#42543)
80b408d1
yonigozlan Allow fallback to loading from Auto"SubProcessor".from_pretrained whe…
675e8763
vasqu [`CI`] Fix copies (#42571)
a5c061d2
MekkCyber [Quantization] per tensor quantization kernel (#42560)
51c5a7a6
winglian make sure the FSDP plugin args are appropriately cast to bools (#42566)
377a8ee7
MekkCyber [Quantization] fix fbgemm (#42561)
15b79ea8
hmellor Use `getattr` in `standardize_rope_params` because `rope_parameters` …
232ecf2c
YangKai0616 [XPU] Fix fp8 UT patch func (#42584)
75c135d7
ngazagna-qc Fix loaded data order bug when resuming from epoch >= 1 (#40691)
629c0da4
winglian fix the FSDP1 default for reshard_after_forward (#42578)
cba7ae86
arrdel fix: correct typos in code comments (#42577)
8ba286ba
marconaguib fix : cast into floats AFTER all assignments (#42587)
130dc471
Xiao-Chenguang Fix mixed torch.Tensor and DTensor in generate when use FSDP2 + LoRA …
17c7c496
YangKai0616 Fix the FA2 logic in the longcat_flash model (#42549)
c0328af6
MekkCyber [Quantization] Remove dequant fp8 config (#42596)
7266f50b
remi-or Add torch compile to CB (#42516)
ef780bf1
hmellor Allow `validation_fn` to be `None` in `validate_rope` (#42601)
b8d5018e
Furkan-rgb Add SDPA support for PatchTST model (#42465)
dd6cfdd3
Cyrilvallez Align tie weights in Idefics (#42551)
81aabe72
ydshieh repo. consistency bot (#42575)
d5d87934
vasqu Fix Ernie Moe Test (#42595)
9e82c779
albertvillanova Fix some models cache initialization (#42586)
a48d68c6
yao-matrix extend FA2 and other cases to XPU, (#42536)
2e93004a
LysandreJik Update revision so that there is a safetensors model (#42618)
3cdccba0
Rocketknight1 Every model forward() should have **kwargs (#42603)
9b74e4c4
casinca fix(Qwen3VLCausalLMOutputWithPast): missing `hidden_states` and `atte…
0c3d043e
MekkCyber [core] Fix quark (#42457)
bebfab06
MekkCyber Fix small weight loading example (#42622)
552409e5
mertunsall Fix _is_package_available to handle underscore/hyphen equivalence (#4…
bf8b9e7b
anranlee99 Fix typo in docstring in modeling_sam3_tracker.py (#42438)
a3e2d547
Rocketknight1 [V5] Return a BatchEncoding dict from apply_chat_template by default …
ce53cc00
Deep-unlearning more tts pipeline exampel (#42484)
f6dcac65
UserChen666 Adapt some test case on npu (#42335)
b0831697
efazal feat(trainer): Just-in-time (JIT) asynchronous checkpointing using SI…
fda2d735
winglian mark params as _is_hf_initialized with DS Zero3 from weight conversio…
e920f94b
Cyrilvallez [loading] Allow loading to happen without threading (#42619)
280c5d6d
Rocketknight1 Remove splitted_tests.txt file (#42625)
4c9fde2a
Rocketknight1 Fix interactions between require_read_token and staticmethod (#42522)
3a8d291a
SunMarc Fix FSDP bnb error (#42600)
91865a69
hawon223 Move max_new_tokens recommendation into GenerationConfig docstring (#…
f8e69286
Sai-Suraj-27 Tiny Clean up `_deps` in setup.py (#42607)
2c298389
liangel-02 [torchao] safetensors (#42529)
328396d9
Sai-Suraj-27 Fixed convert_batch_to_list_format staticmethod function call (#42476)
390dca67
itazap regression from tokenizers v5 to fix fast reference for pipeline (#42…
35f32e94
ydshieh Better security for `pr-repo-consistency-bot.yml` (#42646)
ee7e67bf
3outeille test ci training for text model only (#42597)
afa43c73
ydshieh Ultra security for `pr-repo-consistency-bot.yml` (#42652)
75ae02f0
a4lg Fix a typo in GGML integration of Qwen2 MoE (#42650)
366de9a6
Cyrilvallez Offloading need to add the prefix into the offload_index (#42624)
20890e3b
yonigozlan Fix saving multiple tokenizers for custom processors (#42630)
e5aad213
jiqing-feng Compress (#42643)
28906c3c
MekkCyber [kernels] fix typing for Kernel mapping (#42623)
626875b6
SunMarc small cleaning of quantization class (#42633)
01267073
Bissmella Fixing typo in documentation (philosophy) (#42647)
4ad279fb
stevhliu [docs] TP blog post (#42637)
fccb0499
Cyrilvallez [loading] Correctly load params during offloading & careful memory co…
1d86d00e
stevhliu [docs] Attention backends + continuous batching (#42329)
e3673ed4
eustlb Lasr model (#42648)
ff13eb66
ydshieh Improve SSH into runner (#42695)
8d75aabf
Abdennacer-Badaoui update and add Expectations for mistral3/internvl tests (#42616)
81b84175
MekkCyber [Quantization] Fix FP8 experts replacing (#42654)
ca1698ef
Wauplin Use hfh's is_offline_mode helper (#42657)
ba1ad535
qihqi Let transformers know when a model is being traced via jax.jit (torch…
5ee9ffe3
vasqu [`mRope`] Fix warnings (#42660)
e3ceeafd
ArthurZucker CircleCI failed test summary (#42240)
e636ea2b
qgallouedec Remove Neptune integration references and deprecate `NeptuneCallback`…
8eef4bbf
BenjaminBossan FIX Error when trying to load non-LoRA PEFT (#42663)
d3ee06b8
YangKai0616 Fixed paged|FA2 kernel loading logic and UT. (#42547)
75beab1c
Cyrilvallez Fix PEFT integration with new weight loader (#42701)
142ae3d9
AgainstEntropy Remove duplicated imports (#42689)
9e888145
Abdennacer-Badaoui update gemma3 exepectations and switch to dynamic cache (#42688)
ad541045
Sai-Suraj-27 Fixed failing `BioGPT` batch generation test (#42677)
e8e142de
Sai-Suraj-27 Fix failing `ColPaliModelIntegrationTest` (#42705)
0e0af808
Sai-Suraj-27 Fixed failing Bart-Model Integration Tests (#42676)
b3565823
Sai-Suraj-27 Fixed failing batch_generation test for `opt` model (#42693)
0e51e7a2
ydshieh hotfix for circleci summary workflow (#42708)
2db992d8
SunMarc fix tp (#42704)
6564633b
yonigozlan Raise error when missing or incorrect dates (#42610)
8fe97d90
simonreise Fix indentation in EoMT fast image processor (#42707)
745ad8c7
ydshieh Delete previous comments of `View the CircleCI Test Summary` (#42725)
799103bf
SunMarc Fix tests quantization (#42703)
5ac6284b
MekkCyber [kernels] make the module declaration implicit with decorator (#42700)
c1ac1825
Sai-Suraj-27 Fix failing `owlv2` image processor integration test (#42714)
ec37fc88
SunMarc Update replace_with_ for quants methods to not use recursion (#42711)
85ced0f9
Sai-Suraj-27 Fix failing `CodeGenModelTests` (#42730)
9a6df2ce
ArthurZucker Refactor-tokenization-more (#42563)
73a13f86
Cyrilvallez 🚨🚨 [saving] Default to 50GB shards, and remove non-safe serialization…
3f3cae74
simonreise Add an alternative scenario to EoMT `post_process_semantic_segmentati…
5b4d72c5
casinca fix links in `CONTRIBUTING.md` (#42745)
471d7ce9
hmellor Only default `rope_parameters` to empty `dict` if there is something …
3230fb50
hmellor Only call `torch.autocast` if it will have an effect (#42747)
6d0adb5b
MekkCyber [Quantization] Fixing some tests (#42763)
8f978e5b
xin3he Ensure e_score_correction_bias dtype of DeepSeek-V3/R1 is FP32 (#42580)
2e29a9a6
MekkCyber [kernels] Fix kernel CI (#42764)
508a9764
ArthurZucker Stricter checks for mistral patch (#42743)
b9951b4e
dongluw Command-a-vision fix (#42642)
1b8ccf1c
hqkqn32 fix: support tensor labels in DataCollatorWithFlattening (#42620)
f54647c8
a4lg Override Transformers defaults by GGUF defaults (#42770)
51a66739
MekkCyber [Quantization] Fix Static FP8 Quantization (#42775)
15735a43
MekkCyber [core] fix fp-quant (#42613)
c3acdd57
Qubitium Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel (#41567)
8ebfd84f
winglian allow registration of custom checkpoint conversion mappings (#42634)
45d8168e
gabe-l-hart feat(granitemoe*): Remove logits upcast when computing loss (#42753)
0af2381f
ArthurZucker 🚨🚨🚨🚨🚨🚨🚨🚨🚨 default to `"auto"` dtype (#34919)
f5aa90d0
vasqu [`Padding-Free Attention`] Fix packed FA attention with pos ids only …
d1eda63f
LysandreJik Revert "🚨🚨🚨🚨🚨🚨🚨🚨🚨 default to `"auto"` dtype (#34919)"
a8f32a0e
MekkCyber [Quantization] FBgemm FP8 for XPU (#42773)
86644be4
remi-or Better continuous batching tests (#42699)
f8e5ae6a
SunMarc fix awq (#42776)
f8e8ddb0
MekkCyber [CI] fix wav2vec test (#42810)
78b29929
zhang-prog [Model] Add PaddleOCR-VL Model Support (#42178)
8c84144b
merveenoyan Vision docs 📝 (#42096)
eaa3d4dd
MekkCyber [kernels] Final kernel removal 🥳 (#41664)
de055d6d
yonigozlan Fix integration test in Owlv2 image processing tests (#42783)
dfe6e4c0
MekkCyber [kernels] adding RMSNorm kernel for mps devices (#42058)
0c18820c
SunMarc update deprecation msg for `warmup_ratio` (#42813)
6a93635e
Cyrilvallez Raise conversion errors after loading (#42807)
464dfa04
LysandreJik Automatic release script (#42808)
8a2a83d5
ArthurZucker Default auto (#42805)
6217adc6
stevhliu [docs] Chat content patterns (#42748)
6d00f6b0
YangKai0616 [XPU] Fix UT errors in the sam3 and lfm series model. (#42798)
f80b0485
LysandreJik Add requires_backends to the main init (#42799)
3fbd59b6
SunMarc Fix deepspeed sp loss due to missing labels (#42812)
780cc659
ZX-ModelCloud Compatible with GPTQModel FORAMT.LLM_AWQ (#42833)
b19844ee
jiqing-feng Fix xpu output check for Ministral3 tests (#42761)
c24b51dd
remi-or Fixes for the failures of AMD CI (#42718)
aa495f62
zucchini-nlp Fix a typo in MoE models (#42835)
37426b27
Cyrilvallez Simplify dtype instantiation (#42825)
64a7cc82
kho Add inputs_to_logits_ratio to LasrCTCConfig (#42720)
65dc2615
Cyrilvallez [saving] Simplify general logic (#42766)
e6b9d061
vasqu [`T5Gemma2`] Fix bidirectional mask for encoder (#42820)
33c948e4
Cyrilvallez Do not rely on config for inferring model dtype (#42838)
5b710c75
yonigozlan Improve BatchFeature: stack list and lists of torch tensors (#42750)
a61aba59
Cyrilvallez Reapply modular examples (#42846)
c2470630
Cyrilvallez Fix Gemma (#42847)
40dc11cd
YangKai0616 [Fix] Fix FA2 kernels ut (#42803)
e17b1b85
jiqing-feng Fix speccht5_tts pipeline (#42830)
66623a1f
Sai-Suraj-27 Fixes 2 failing tests from AMD CI (#42777)
f0d9cd1f
MekkCyber [docs] Improve contribution guidelines for Quantization (#42870)
64c12fdf
Cyrilvallez Remove tied weights from internal attribute if they are not tied (#42…
a187b857
AYou0207 typo (#42863)
298d08dc
remi-or [CB] Easy optimizations for continuous batching (#42839)
f3d5f255
Cyrilvallez Enforce call to `post_init` and fix all of them (#42873)
c7aec088
yonigozlan Remove null values from fast image processors dict (#42780)
fc50bdc6
wasertech fix: Initialize ApertusMLP's xielu activation using `torch_dtype` (#4…
06378d40
yonigozlan Simplify using custom resolution for sam3 and sam3_video inference (#…
23394cc4
stevhliu [docs] optimizations quickstart (#42538)
31de95ef
abidlabs Add `.on_push_begin()` callback to Trainer and implement for `Trackio…
7f52a2a4
preetam1407 Fix BLT training_ci overfit test (#42685)
0f97c688
tomaarsen Add missing ModelOutput subclass return type hints (#41219)
6c7c992f
patrickvonplaten [Devstral] Make sure FP8 conversion works correctly (#42715)
7960b5ea
Cyrilvallez [modular] Fix a weird renaming edge-case (#42844)
8d526c23
Taise228 Stop collecting all model parameters to save models when using DeepSp…
89998bdd
juliendenize Fix convert_tekken_tokenizer (#42592)
252afd89
vasqu [`Ernie 4.5 Moe`] Fix routing, weights, and update expectations (#42653)
4e7cecb2
avihu111 Fix GraniteMoeHybrid in transformers v5 (#42872)
5d2f82b5
romitjain Added kernels from kernel hub for Bamba model (#41540)
0f896619
sywangyi fix FastSpeech2ConformerTokenizer crash in tokenize (#42888)
24b311ee
Cyrilvallez Simplify tie weights logic (#42895)
4d6516e2
zheliuyu Add local kernel loading support to KernelConfig(). (#42800)
24275124
zucchini-nlp Remove duplicated processor class from config (#42806)
b61da251
CandiedCode fix: typehits for Causal LM models (#42885)
4c64a8fb
itazap refactor more tokenizers - v5 guide update (#42768)
6994c5ac
3outeille fix `Dtensor` and `tensor` mismatch (#42906)
b1a2fba1
eustlb Sam: Perception Encoder Audiovisual (#42905)
9aef5ca4
yonigozlan Fix add_dates script: Fetch github repo from url to check if model is…
703da867
yonigozlan Support having multiple sub-processors (of any kind) in the same proc…
dd24a806
jackzhxng Rewrite for loop in get_image_features with torch ops for export (#42…
a33ef4f9
sarathc-cerebras adds jais2 model support (#42684)
0dbf8085
zucchini-nlp Overwrite `get_decoder()` in AudioLLMs (#42896)
1dc69bd6
ankke Preprocessing fixes and more tests for LFM2-VL (#42784)
558666f2
vasqu [`Tokenizers`] Change treatment of special tokens (#42903)
ade62c2a
vasqu [`Auto`] Make processor subclasses overridable on load time (#42912)
79432f7a
a4lg Qwen2/3 MoE + GGUF model support (restored) (#42854)
c67ec2c4
nandan2003 Fix: Pass local_files_only from pipeline() to model loading (#42318)
0218f1a5
SunMarc Fix cuda index (#42897)
05c7e4a4
LiheYoung Add Pixio pre-trained models (#42795)
a05e0e27
yonigozlan Remove tied weight keys Sam2Video (#42840)
171e079e
3outeille fix Dtensor and tensor mismatch for Col/RowRep (#42924)
99be81e7
MekkCyber [kernels] Add user_agent to track kernels metrics (#41689)
0001b3ee
SunMarc Fix dtype quantizer (#42882)
b05d2c43
molbap Make gradient-checkpoint enabling tolerant of models without get_inpu…
b712a97d
jiqing-feng Remove ipex/ccl in CPU training doc (#42866)
f404f150
casinca docs: Squared ReLU paper fix (#42931)
2f9e21f5
stevhliu [docs] WeightConverter (#42636)
1aab1e9c
stevhliu [docs] Expert parallelism (#42409)
9e3568e0
stevhliu [docs] Update shard size (#42749)
12fe95f8
stevhliu [docs] optimization cleanup (#42827)
5ef16edd
yonigozlan Improve BatchFeature (.to() works on lists/nested lists of tensors, a…
9f583b1b
Wauplin Document new default shard size + dropped unsafe serialization (#42904)
bdaddb6f
zucchini-nlp 🚨 Generation config defaults are now `None` (#42702)
a81e04a9
MekkCyber [Quantization] rm _pre_quantization_dtype from quantization tests (#4…
d3d4b629
MekkCyber [Quantization] Misc tests fixes (#42940)
d7dd443a
remi-or [CB] Allow block sharing in hybrid models (#42877)
04e78e67
kylesayrs [Tests] Fix CompressedTensors tests (#42935)
0a846542
SunMarc Update `param_element_size` (#42818)
dd8057af
Yacklin rewrite _process_parameter_type in auto_docstring.py to improve usabi…
728f34c3
Cyrilvallez Add buffers to `_init_weights` for ALL models (#42309)
537c2e3d
SunMarc Fp8 dq (#42926)
af91c0ba
MekkCyber [Quantization] Removing misleading int8 quantization in Finegrained F…
4dc60fd3
majiayu000 fix(tvp): add missing type_vocab_size parameter to TvpConfig (#42928)
b62e5b3e
lukepayyapilli 🚨 Fix ConvNeXt image processor default interpolation to BICUBIC (#42934)
60634caa
zucchini-nlp Load generation config from nested configs (#42922)
f2c6d2ad
stevhliu [docs] dtype (#42883)
b93f2e3a
simonreise Updated `backbone_config` docstrings and type annotations (#42927)
3e4baf8e
MekkCyber [Quantization] CI green by end of year (#42951)
b5eea347
MekkCyber [kernels] Fix failling tests (#42953)
cf0f071e
eustlb fix concat order (#42946)
a4d62291
kaixuanliu fix error: 'BlockMask' object has no attribute 'dtype' for lasr model…
789226c1
Cyrilvallez [loading] Really initialize on meta device for huge perf gains (#42941)
bb9357ff
paulinebm Add runner specification to CodeQL workflow (#42955)
d54d78f7
LysandreJik Fix infinity in JSON serialized files (#42959)
d14d99ea
vasqu [`Generation`] Fix default overwrite for non-`None` defaults (#42958)
f218ed21
vasqu [`Ernie 4.5`] Ernie VL models (#39585)
a8a22624
SunMarc Fix tests trainer again (#42933)
70179949
zucchini-nlp Single config attribute for weight tying (#42815)
9a90500b
kaixuanliu fix device dismatch issue for pe_audio_video model parallelism (#42917)
1b280f99
vasqu [`Tests`] Fix inputs placement (#42963)
007274db
Cyrilvallez Do not use global variable, and improve context manager coverage (#42…
0d2dbaa9
Cyrilvallez Hardcode the factor in caching allocator (#42996)
f5d9d808
abidlabs Fix formatting of trackio model tag (#42973)
817886a6
NielsRogge Fix merge
eb982843
antznette1 Fix DocQA max_answer_len validation error message (#42948)
64ec2ba1
leaderofARS Fix incorrect library name in BitNet integration warning (#42966)
f7d139c1
cyyever Improve spacing of markdown files (#42984)
e9f0f8e0
Cyrilvallez [loading][TP] Fix device placement at loading-time, and simplify shar…
5f1c05cf
SunMarc Fix deepspeed + quantization (#43006)
42512f79
NielsRogge Fix merge, make fixup
119f2fd9
molbap Do not return a tuple in mistral tokenizer Automapping (#42997)
9971e410
NielsRogge Merge remote-tracking branch 'upstream/main' into codex/integrate-eom…
d997f7e3
NielsRogge Fix tests
667d1304

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone