Add EoMT DINOv3 model #58
NielsRogge
force pushed
from
1f098ffa
to
9e24af70
82 days ago
Fix Qwen3OmniMoE weight init (#42531)
dac2ad76
small fix tokenizer regex patch (#42528)
83fe012d
[TP plans] Fix some incorrects TP plans (#42448)
6e3f2f8f
[Ministral 3] Add ministral 3 (#42498)
bf3f0ae7
Fix ernie moe (#42535)
0fa49db1
Add FastVLM (#41112)
a6497675
Fix Qwen-VL family with prompt tuning (#42508)
57eeb9ca
Fix failing test in Glm4vMoeIntegrationTest (#42488)
ac0769cd
[Quantization] fix dequant when block size is none & static quantizat…
bb09a30f
[Ministral 3] Small fix config (#42537)
64d8cf4f
[Fix] dots1 expert bias routing (#41663)
29e8522b
Fix fp8 + some enhancement (#42455)
bc7a268f
[test] delete `SeamlessM4TProcessorTest::test_save_load_pretrained_ad…
4ec83fe9
Fix eetq quanto quant methods (#42557)
5efd0d4a
Add backward compatibility for methods which have been moved to `Rota…
0ba8f001
Fix `parse_response` after tokenizer refactor (#42300)
5690f24e
fix regression (#42569)
53d2bf6d
Kernel mapping error resolve (#42466)
3f174109
Transformers serve fix (#42570)
52b988d8
[SAM3] Compute masks once instead of per-layer, fix fa2 crash (#42543)
80b408d1
Allow fallback to loading from Auto"SubProcessor".from_pretrained whe…
675e8763
[`CI`] Fix copies (#42571)
a5c061d2
[Quantization] per tensor quantization kernel (#42560)
51c5a7a6
make sure the FSDP plugin args are appropriately cast to bools (#42566)
377a8ee7
[Quantization] fix fbgemm (#42561)
15b79ea8
Use `getattr` in `standardize_rope_params` because `rope_parameters` …
232ecf2c
[XPU] Fix fp8 UT patch func (#42584)
75c135d7
Fix loaded data order bug when resuming from epoch >= 1 (#40691)
629c0da4
fix the FSDP1 default for reshard_after_forward (#42578)
cba7ae86
fix: correct typos in code comments (#42577)
8ba286ba
fix : cast into floats AFTER all assignments (#42587)
130dc471
Fix mixed torch.Tensor and DTensor in generate when use FSDP2 + LoRA …
17c7c496
Fix the FA2 logic in the longcat_flash model (#42549)
c0328af6
[Quantization] Remove dequant fp8 config (#42596)
7266f50b
Add torch compile to CB (#42516)
ef780bf1
Allow `validation_fn` to be `None` in `validate_rope` (#42601)
b8d5018e
Add SDPA support for PatchTST model (#42465)
dd6cfdd3
Align tie weights in Idefics (#42551)
81aabe72
repo. consistency bot (#42575)
d5d87934
Fix Ernie Moe Test (#42595)
9e82c779
Fix some models cache initialization (#42586)
a48d68c6
extend FA2 and other cases to XPU, (#42536)
2e93004a
Update revision so that there is a safetensors model (#42618)
3cdccba0
Every model forward() should have **kwargs (#42603)
9b74e4c4
fix(Qwen3VLCausalLMOutputWithPast): missing `hidden_states` and `atte…
0c3d043e
[core] Fix quark (#42457)
bebfab06
Fix small weight loading example (#42622)
552409e5
Fix _is_package_available to handle underscore/hyphen equivalence (#4…
bf8b9e7b
Fix typo in docstring in modeling_sam3_tracker.py (#42438)
a3e2d547
[V5] Return a BatchEncoding dict from apply_chat_template by default …
ce53cc00
more tts pipeline exampel (#42484)
f6dcac65
Adapt some test case on npu (#42335)
b0831697
feat(trainer): Just-in-time (JIT) asynchronous checkpointing using SI…
fda2d735
mark params as _is_hf_initialized with DS Zero3 from weight conversio…
e920f94b
[loading] Allow loading to happen without threading (#42619)
280c5d6d
Remove splitted_tests.txt file (#42625)
4c9fde2a
Fix interactions between require_read_token and staticmethod (#42522)
3a8d291a
Fix FSDP bnb error (#42600)
91865a69
Move max_new_tokens recommendation into GenerationConfig docstring (#…
f8e69286
Tiny Clean up `_deps` in setup.py (#42607)
2c298389
[torchao] safetensors (#42529)
328396d9
Fixed convert_batch_to_list_format staticmethod function call (#42476)
390dca67
regression from tokenizers v5 to fix fast reference for pipeline (#42…
35f32e94
Better security for `pr-repo-consistency-bot.yml` (#42646)
ee7e67bf
test ci training for text model only (#42597)
afa43c73
Ultra security for `pr-repo-consistency-bot.yml` (#42652)
75ae02f0
Fix a typo in GGML integration of Qwen2 MoE (#42650)
366de9a6
Offloading need to add the prefix into the offload_index (#42624)
20890e3b
Fix saving multiple tokenizers for custom processors (#42630)
e5aad213
Compress (#42643)
28906c3c
[kernels] fix typing for Kernel mapping (#42623)
626875b6
small cleaning of quantization class (#42633)
01267073
Fixing typo in documentation (philosophy) (#42647)
4ad279fb
[docs] TP blog post (#42637)
fccb0499
[loading] Correctly load params during offloading & careful memory co…
1d86d00e
[docs] Attention backends + continuous batching (#42329)
e3673ed4
Lasr model (#42648)
ff13eb66
Improve SSH into runner (#42695)
8d75aabf
update and add Expectations for mistral3/internvl tests (#42616)
81b84175
[Quantization] Fix FP8 experts replacing (#42654)
ca1698ef
Use hfh's is_offline_mode helper (#42657)
ba1ad535
Let transformers know when a model is being traced via jax.jit (torch…
5ee9ffe3
[`mRope`] Fix warnings (#42660)
e3ceeafd
CircleCI failed test summary (#42240)
e636ea2b
Remove Neptune integration references and deprecate `NeptuneCallback`…
8eef4bbf
FIX Error when trying to load non-LoRA PEFT (#42663)
d3ee06b8
Fixed paged|FA2 kernel loading logic and UT. (#42547)
75beab1c
Fix PEFT integration with new weight loader (#42701)
142ae3d9
Remove duplicated imports (#42689)
9e888145
update gemma3 exepectations and switch to dynamic cache (#42688)
ad541045
Fixed failing `BioGPT` batch generation test (#42677)
e8e142de
Fix failing `ColPaliModelIntegrationTest` (#42705)
0e0af808
Fixed failing Bart-Model Integration Tests (#42676)
b3565823
Fixed failing batch_generation test for `opt` model (#42693)
0e51e7a2
hotfix for circleci summary workflow (#42708)
2db992d8
fix tp (#42704)
6564633b
Raise error when missing or incorrect dates (#42610)
8fe97d90
Fix indentation in EoMT fast image processor (#42707)
745ad8c7
Delete previous comments of `View the CircleCI Test Summary` (#42725)
799103bf
Fix tests quantization (#42703)
5ac6284b
[kernels] make the module declaration implicit with decorator (#42700)
c1ac1825
Fix failing `owlv2` image processor integration test (#42714)
ec37fc88
Update replace_with_ for quants methods to not use recursion (#42711)
85ced0f9
Fix failing `CodeGenModelTests` (#42730)
9a6df2ce
Refactor-tokenization-more (#42563)
73a13f86
🚨🚨 [saving] Default to 50GB shards, and remove non-safe serialization…
3f3cae74
Add an alternative scenario to EoMT `post_process_semantic_segmentati…
5b4d72c5
fix links in `CONTRIBUTING.md` (#42745)
471d7ce9
Only default `rope_parameters` to empty `dict` if there is something …
3230fb50
Only call `torch.autocast` if it will have an effect (#42747)
6d0adb5b
[Quantization] Fixing some tests (#42763)
8f978e5b
Ensure e_score_correction_bias dtype of DeepSeek-V3/R1 is FP32 (#42580)
2e29a9a6
[kernels] Fix kernel CI (#42764)
508a9764
Stricter checks for mistral patch (#42743)
b9951b4e
Command-a-vision fix (#42642)
1b8ccf1c
fix: support tensor labels in DataCollatorWithFlattening (#42620)
f54647c8
Override Transformers defaults by GGUF defaults (#42770)
51a66739
[Quantization] Fix Static FP8 Quantization (#42775)
15735a43
[core] fix fp-quant (#42613)
c3acdd57
Fully deprecate AutoGPTQ and AutoAWQ for GPT-QModel (#41567)
8ebfd84f
allow registration of custom checkpoint conversion mappings (#42634)
45d8168e
feat(granitemoe*): Remove logits upcast when computing loss (#42753)
0af2381f
🚨🚨🚨🚨🚨🚨🚨🚨🚨 default to `"auto"` dtype (#34919)
f5aa90d0
[`Padding-Free Attention`] Fix packed FA attention with pos ids only …
d1eda63f
Revert "🚨🚨🚨🚨🚨🚨🚨🚨🚨 default to `"auto"` dtype (#34919)"
a8f32a0e
[Quantization] FBgemm FP8 for XPU (#42773)
86644be4
Better continuous batching tests (#42699)
f8e5ae6a
fix awq (#42776)
f8e8ddb0
[CI] fix wav2vec test (#42810)
78b29929
[Model] Add PaddleOCR-VL Model Support (#42178)
8c84144b
Vision docs 📝 (#42096)
eaa3d4dd
[kernels] Final kernel removal 🥳 (#41664)
de055d6d
Fix integration test in Owlv2 image processing tests (#42783)
dfe6e4c0
[kernels] adding RMSNorm kernel for mps devices (#42058)
0c18820c
update deprecation msg for `warmup_ratio` (#42813)
6a93635e
Raise conversion errors after loading (#42807)
464dfa04
Automatic release script (#42808)
8a2a83d5
Default auto (#42805)
6217adc6
[docs] Chat content patterns (#42748)
6d00f6b0
[XPU] Fix UT errors in the sam3 and lfm series model. (#42798)
f80b0485
Add requires_backends to the main init (#42799)
3fbd59b6
Fix deepspeed sp loss due to missing labels (#42812)
780cc659
Compatible with GPTQModel FORAMT.LLM_AWQ (#42833)
b19844ee
Fix xpu output check for Ministral3 tests (#42761)
c24b51dd
Fixes for the failures of AMD CI (#42718)
aa495f62
Fix a typo in MoE models (#42835)
37426b27
Simplify dtype instantiation (#42825)
64a7cc82
Add inputs_to_logits_ratio to LasrCTCConfig (#42720)
65dc2615
[saving] Simplify general logic (#42766)
e6b9d061
[`T5Gemma2`] Fix bidirectional mask for encoder (#42820)
33c948e4
Do not rely on config for inferring model dtype (#42838)
5b710c75
Improve BatchFeature: stack list and lists of torch tensors (#42750)
a61aba59
Reapply modular examples (#42846)
c2470630
Fix Gemma (#42847)
40dc11cd
[Fix] Fix FA2 kernels ut (#42803)
e17b1b85
Fix speccht5_tts pipeline (#42830)
66623a1f
Fixes 2 failing tests from AMD CI (#42777)
f0d9cd1f
[docs] Improve contribution guidelines for Quantization (#42870)
64c12fdf
Remove tied weights from internal attribute if they are not tied (#42…
a187b857
typo (#42863)
298d08dc
[CB] Easy optimizations for continuous batching (#42839)
f3d5f255
Enforce call to `post_init` and fix all of them (#42873)
c7aec088
Remove null values from fast image processors dict (#42780)
fc50bdc6
fix: Initialize ApertusMLP's xielu activation using `torch_dtype` (#4…
06378d40
Simplify using custom resolution for sam3 and sam3_video inference (#…
23394cc4
[docs] optimizations quickstart (#42538)
31de95ef
Add `.on_push_begin()` callback to Trainer and implement for `Trackio…
7f52a2a4
Fix BLT training_ci overfit test (#42685)
0f97c688
Add missing ModelOutput subclass return type hints (#41219)
6c7c992f
[Devstral] Make sure FP8 conversion works correctly (#42715)
7960b5ea
[modular] Fix a weird renaming edge-case (#42844)
8d526c23
Stop collecting all model parameters to save models when using DeepSp…
89998bdd
Fix convert_tekken_tokenizer (#42592)
252afd89
[`Ernie 4.5 Moe`] Fix routing, weights, and update expectations (#42653)
4e7cecb2
Fix GraniteMoeHybrid in transformers v5 (#42872)
5d2f82b5
Added kernels from kernel hub for Bamba model (#41540)
0f896619
fix FastSpeech2ConformerTokenizer crash in tokenize (#42888)
24b311ee
Simplify tie weights logic (#42895)
4d6516e2
Add local kernel loading support to KernelConfig(). (#42800)
24275124
Remove duplicated processor class from config (#42806)
b61da251
fix: typehits for Causal LM models (#42885)
4c64a8fb
refactor more tokenizers - v5 guide update (#42768)
6994c5ac
fix `Dtensor` and `tensor` mismatch (#42906)
b1a2fba1
Sam: Perception Encoder Audiovisual (#42905)
9aef5ca4
Fix add_dates script: Fetch github repo from url to check if model is…
703da867
Support having multiple sub-processors (of any kind) in the same proc…
dd24a806
Rewrite for loop in get_image_features with torch ops for export (#42…
a33ef4f9
adds jais2 model support (#42684)
0dbf8085
Overwrite `get_decoder()` in AudioLLMs (#42896)
1dc69bd6
Preprocessing fixes and more tests for LFM2-VL (#42784)
558666f2
[`Tokenizers`] Change treatment of special tokens (#42903)
ade62c2a
[`Auto`] Make processor subclasses overridable on load time (#42912)
79432f7a
Qwen2/3 MoE + GGUF model support (restored) (#42854)
c67ec2c4
Fix: Pass local_files_only from pipeline() to model loading (#42318)
0218f1a5
Fix cuda index (#42897)
05c7e4a4
Add Pixio pre-trained models (#42795)
a05e0e27
Remove tied weight keys Sam2Video (#42840)
171e079e
fix Dtensor and tensor mismatch for Col/RowRep (#42924)
99be81e7
[kernels] Add user_agent to track kernels metrics (#41689)
0001b3ee
Fix dtype quantizer (#42882)
b05d2c43
Make gradient-checkpoint enabling tolerant of models without get_inpu…
b712a97d
Remove ipex/ccl in CPU training doc (#42866)
f404f150
docs: Squared ReLU paper fix (#42931)
2f9e21f5
[docs] WeightConverter (#42636)
1aab1e9c
[docs] Expert parallelism (#42409)
9e3568e0
[docs] Update shard size (#42749)
12fe95f8
[docs] optimization cleanup (#42827)
5ef16edd
Improve BatchFeature (.to() works on lists/nested lists of tensors, a…
9f583b1b
Document new default shard size + dropped unsafe serialization (#42904)
bdaddb6f
🚨 Generation config defaults are now `None` (#42702)
a81e04a9
[Quantization] rm _pre_quantization_dtype from quantization tests (#4…
d3d4b629
[Quantization] Misc tests fixes (#42940)
d7dd443a
[CB] Allow block sharing in hybrid models (#42877)
04e78e67
[Tests] Fix CompressedTensors tests (#42935)
0a846542
Update `param_element_size` (#42818)
dd8057af
rewrite _process_parameter_type in auto_docstring.py to improve usabi…
728f34c3
Add buffers to `_init_weights` for ALL models (#42309)
537c2e3d
Fp8 dq (#42926)
af91c0ba
[Quantization] Removing misleading int8 quantization in Finegrained F…
4dc60fd3
fix(tvp): add missing type_vocab_size parameter to TvpConfig (#42928)
b62e5b3e
🚨 Fix ConvNeXt image processor default interpolation to BICUBIC (#42934)
60634caa
Load generation config from nested configs (#42922)
f2c6d2ad
[docs] dtype (#42883)
b93f2e3a
Updated `backbone_config` docstrings and type annotations (#42927)
3e4baf8e
[Quantization] CI green by end of year (#42951)
b5eea347
[kernels] Fix failling tests (#42953)
cf0f071e
fix concat order (#42946)
a4d62291
fix error: 'BlockMask' object has no attribute 'dtype' for lasr model…
789226c1
[loading] Really initialize on meta device for huge perf gains (#42941)
bb9357ff
Add runner specification to CodeQL workflow (#42955)
d54d78f7
Fix infinity in JSON serialized files (#42959)
d14d99ea
[`Generation`] Fix default overwrite for non-`None` defaults (#42958)
f218ed21
[`Ernie 4.5`] Ernie VL models (#39585)
a8a22624
Fix tests trainer again (#42933)
70179949
Single config attribute for weight tying (#42815)
9a90500b
fix device dismatch issue for pe_audio_video model parallelism (#42917)
1b280f99
[`Tests`] Fix inputs placement (#42963)
007274db
Do not use global variable, and improve context manager coverage (#42…
0d2dbaa9
Hardcode the factor in caching allocator (#42996)
f5d9d808
Fix formatting of trackio model tag (#42973)
817886a6
Fix merge
eb982843
Fix DocQA max_answer_len validation error message (#42948)
64ec2ba1
Fix incorrect library name in BitNet integration warning (#42966)
f7d139c1
Improve spacing of markdown files (#42984)
e9f0f8e0
[loading][TP] Fix device placement at loading-time, and simplify shar…
5f1c05cf
Fix deepspeed + quantization (#43006)
42512f79
Fix merge, make fixup
119f2fd9
Do not return a tuple in mistral tokenizer Automapping (#42997)
9971e410
Merge remote-tracking branch 'upstream/main' into codex/integrate-eom…
d997f7e3
Fix tests
667d1304
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub