v4.57.0 Branch #41310

LysandreJik merged 167 commits into v4 from v4-backup
ArthurZucker
ydshieh Update expected values for one more `test_speculative_generation` aft…
5748352c
rangehow FIX(trainer): ensure final checkpoint is saved when resuming training…
564fde14
zucchini-nlp Add new model LFM2-VL (#40624)
c5325757
cyyever Fix outdated version checks of accelerator (#40969)
f6104189
hamishs Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)
7cf1f5ce
SunMarc [Trainer] Fix DP loss (#40799)
9378f874
harshaljanjani [timm_wrapper] better handling of "Unknown model" exception in timm (…
6e51ac31
brandenkmurray Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate to…
2ce35a24
Cyrilvallez [tests] Really use small models in all fast tests (#40945)
dd7ac4cd
ydshieh Add captured actual outputs to CI artifacts (#40965)
738b223f
qubvel Revert change in `compile_friendly_resize` (#40645)
d9d7f6a6
ydshieh Track the CI (model) jobs that don't produce test output files (proce…
5ac3c517
Cyrilvallez Remove `set_model_tester_for_less_flaky_tests` (#40982)
5c2f5663
ahadnagy Benchmarking v2 GH workflows (#40716)
47c1a1b4
BenjaminBossan ENH: Enable readline support for transformers chat (#40911)
5a246131
ydshieh [testing] test `num_hidden_layers` being small in model tester (#40992)
103fe0d5
itazap blt wip (#38579)
a5ffae62
vasqu [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)
78f3e087
ydshieh Make `EfficientLoFTRModelTest` faster (#41000)
a89ed714
cyyever Fix typoes in src and tests (#40845)
662ea950
yonigozlan Fix more dates in model cards and wrong modalities in _toctree.yml (#…
f73f73d4
cyyever RUFF fix on CI scripts (#40805)
6e1270d2
SunMarc fix dict like init for ModelOutput (#41002)
251825aa
gante [tests] update `test_left_padding_compatibility` (and minimize overwr…
f47c6514
ydshieh Patch more `unittest.case.TestCase.assertXXX` methods (#41008)
b164209d
sbucaille 🚨 [lightglue] fix: matches order changed because of early stopped ind…
d6d2d03b
ydshieh Fix `PhimoeIntegrationTest` (#41007)
b2b50448
Cyrilvallez Fix Glm4v test (#41011)
e5a9a1de
ydshieh Update after #41007 (#41014)
9de898e5
ahadnagy Fix benchmark runner argument name (#41012)
c1cf8dee
BakerBunker Adding support for Qwen3Omni (#41025)
41813d32
Flakes342 Making compute_loss_func always take priority in Trainer (#40632)
71f768bc
BakerBunker Modify Qwen3Omni parameter name since VL changed it (#41045)
23d0c62a
zucchini-nlp Fix Qwen video tests (#41049)
f1a8aff9
ydshieh [testing] Fix `qwen2_audio` (#41018)
c6d3d0b9
cyyever Fix typing of tuples (#41028)
30dadfd5
cyyever Remove optax (#41030)
c931992d
cyyever Fix typos in English/Chinese documentation (#41031)
84600532
cyyever Use torch.autocast (#40975)
e6f5f948
RyanMullins docs: improved RoPE function Docstrings (#41004)
1ca91812
yannicks1 Fix condition for emitting warning when generation exceeds max model …
7425f6dc
cyyever Fix outdated torch version check (#40925)
9b221a84
rjgleaton Add Whole Word Masking and Padding Strategy to DataCollatorForLanguag…
c2c9074b
ydshieh [testing] Fix `seed_oss` (#41052)
5fb3b354
cyyever Remove repeated import (#40937)
36911028
cyyever Simplify unnecessary Optional typing (#40839)
d43b73cb
ahadnagy Add write token for uploading benchmark results to the Hub (#41047)
9de77d70
remi-or Ci utils (#40978)
98e87dbf
ydshieh Fix CI jobs being all red 🔴 (false positive) (#41059)
bdbe9878
SunMarc Update quantization CI (#41068)
abbf0edd
saidurpulok [i18n-bn] Add Bengali language README file (#40935)
a9266c98
mapmeld Improve documentation and errors in Mamba2-based models (#41063)
ed8d3aaa
ydshieh Update team member list for some CI workflows (#41094)
fc974a97
sywangyi fix crash when using chat to send 2+ request to gptoss (#40536)
dca053d1
DuyguA Minor addition, no split modules for VideoMAEE (#41051)
ea92b1a0
ydshieh Switch to `python:3.10-slim` for CircleCI docker images (#41067)
722be9f5
ahadnagy Fix argument name in benchmarking script (#41086)
e140ee3c
cyyever Fix typos in documentation (#41087)
9957b448
cyyever Fix typing (#40788)
281b8b62
cyyever Remove unused arguments (#40916)
72e7f343
Juude fix wrong height and width when read video use torchvision (#41091)
93655f31
RyanMullins docs: Fix Tool Use links and remove dead RAG links (#41104)
c42b27b9
gante [tests] gpt2 + `CausalLMModelTester` (#41003)
9d9177f4
ydshieh Fix `_get_test_info` for inherited tests (#41106)
8291a7fc
Cyrilvallez Remove bad test skips (#41109)
7bf0c7d3
cyyever Format empty lines and white space in markdown files. (#41100)
1f7c6535
cyyever Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)
a5a88829
HaroldBenoit Support loading LFM2 GGUF (#41111)
38c30bba
liangel-02 [torchao safetensors] integrate torchao safetensors support with tran…
f212a0b4
notkisk [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule a…
957b5568
cyyever Fix the error where a keyword argument appearing before *args (#41099)
7fde9757
cyyever Fix broken `` expressions in markdown files (#41113)
c6f31abf
cyyever Remove self-assignment (#41062)
48c8c8db
YangKai0616 Fixed MXFP4 model storage issue (#41118)
25c8ac57
Szustarol Fixed loading LongT5 from legacy checkpoints (#40724)
0bc795f8
ydshieh dummy commit (#41133)
99630b85
LysandreJik Fix loading logic flaw with regards to unexpected and missing keys (#…
6e913fc9
Xqle Fix: align Qwen2.5-VL inference rope index with training by passing s…
477b7a3a
cyyever Fix single quotes in markdown (#41154)
287652a2
yao-matrix extend gemma3n integration ut cases on XPU (#41071)
174a5c4e
nithinraok Add Parakeet (#39062)
53ce2f82
cyyever Fix format of compressed_tensors.md (#41155)
bd77d70e
Cyrilvallez Simplify and improve model loading logic (#41103)
6566998e
yonigozlan Force new vision models addition to include a fast image processor (#…
83fc0ee9
cyyever Add language specifiers to code blocks of markdown files (#41114)
d3d02925
yonigozlan Improve `add_dates` script (#41167)
46a138cd
remi-or Fix flash-attn for paged_attention when no kernels (#41078)
2e498266
LysandreJik Remove data from examples (#41168)
ac8703da
remi-or Enable fa in amd docker (#41069)
14b45582
itazap handle flash slow tests (#41072)
d8152615
remi-or Modernbert fix (#41056)
cd154ae9
glegendre01 CI Runners - move amd runners mi355 and 325 to runner group (#41193)
56a74c39
YangKai0616 [XPU] Add MXFP4 support for XPU (#41117)
9a76ebfa
gante [tests] `CausalLMTester` automatically infers other test classes from…
97ee50f3
cyyever More typing fixes (#41102)
deac4530
yao-matrix enable flex attention ut cases on XPU (#40989)
9b7c343f
The5cheduler fix(trainer): Avoid moving model with device_map (#41032)
469336de
SamuelBarryCS Fix attention sink implementation in flex attention (#41083)
01d8cc0e
ahadnagy Separate docker images for Nvidia and AMD in benchmarking (#41119)
fae2d679
Cyrilvallez Make quantizers good citizens loading-wise (#41138)
3017f04b
vasqu [`Kernels Attention`] Change fallback logic to error out on explicit …
389115c1
yonigozlan Add EdgeTAM (#39800)
ec368a27
lkm2835 Fix EXAONE-4.0 dummy id (#41089)
068e7091
SunMarc Fix 8bit bnb loading (#41200)
a2b6ccff
SunMarc Fix docker quantization (#41201)
a2cdcccc
yonigozlan Embed interactive timeline in docs (#41015)
be826baa
stevhliu [docs] Fix links (#41110)
090ad5db
cyyever Remove unnecessary Optional typing (#41198)
e0973709
tayo4christ docs/examples(speech): pin CTC commands to Hub datasets; add Windows …
5c0fd100
aug6th Fix Qwen3-Omni audio_token_id serialization issue (#41192)
f588aa82
ssharpe42 Wait for main process in _save_checkpoint to ensure best checkpoint e…
4c54a98a
tomaarsen Avoid assumption that model has config attribute in deepspeed (#41207)
7e698ed8
pramodith Trainer: Pass `num_items_in_batch` to `compute_loss` in `prediction_s…
4886248d
pstjohn [ESM] add accepts_loss_kwargs=False to EsmPreTrainedModel (#41006)
d1fd30d9
tomaarsen Align pull request template to bug report template (#41220)
9f7da26d
gante [generate] cache missing custom generate file (#41216)
8a23f340
cyyever Remove old Python code (#41226)
99fbb874
frozenleaves Adapt to the SDPA interface to enable the NPU to call FlashAttentionS…
4c6f26e0
ydshieh update code owners (#41221)
3b03f55b
cyyever Unify is_torchvision_v2_available with is_torchvision_available (#41227)
12c4e6a6
cyyever Fix typing of train_args (#41142)
b0427d66
remi-or Fix sliding window attn mask (#41228)
50907d32
SunMarc Revert "Fix DeepSpeed mixed precision precedence over Accelerate defa…
7db62848
stevhliu [docs] Fix tp_plan (#41205)
86982f21
cyyever Fix white space in documentation (#41157)
ac69a8f1
zucchini-nlp fix qwen text config (#41158)
186a357f
zucchini-nlp Video processor accepts single frames on cuda (#41218)
46954e49
cyyever Use math.log2 (#41241)
25e26411
yao-matrix fix TrainerIntegrationDeepSpeed UT failures (#41236)
7076da63
gante [repo utils] Update `models_to_deprecate.py` (#41231)
9e60961e
cyyever Use removeprefix and removesuffix (#41240)
7aca328f
cyyever Fix pylint warnings (#41222)
c6af1ca2
SunMarc Remove all instances of `is_safetensors_available` (#41233)
b7757de7
BlackSamorez FP-Quant NVFP4 and Python 3.9 support (#39876)
c7616fdf
vasqu [`FA3`] Fix masking and loading logic in same process (#41217)
f672ee02
gante [t5gemma] fix `get_text_config` and related fixes (#40939)
066ca8e4
ydshieh Don't convert to `safetensors` on the fly if the call is from testing…
e49d3d6b
XuehaiPan Resolve remote custom module path warnings (#41243)
9ab2d57a
ydshieh add peft team members to issue/pr template (#41262)
a6f470f6
matthewdouglas docs: update bitsandbytes platform support (#41266)
19826920
MekkCyber add more activation kernels, follow up (#40944)
9e34b40e
yao-matrix fix asr pipeline ut failures (#41275)
e80da3a4
cyyever Use regex defailed flags (#41264)
d8566bc6
tim120526 Fix multi-video timestamp bug in Qwen-3-VL and GLM4V (#41229)
d88a0fbb
daskol Fix binding of video frames to video placeholder in `InternVL` model …
54c026ea
qgallouedec Deprecate Trackio environment variables and deploy to Spaces by defau…
03d976d1
qgallouedec Allow private Space id for Trackio (#40948)
37f1f5d7
SunMarc fix async client for transformers chat (#41255)
247d21ad
cyyever Unify is_torchvision_v2_available with is_torchvision_available (#41259)
26c57efa
cyyever Use max/min (#41280)
91e1bdd0
0x-avi Biogptlogits (#41270)
4f1faa06
cyyever Fix unnecessary single-item container checks (#41279)
9d67585e
cyyever Fix pylint generator warnings (#41258)
89d53495
McPatate feat: use `aws-highcpu-32-priv` for amd docker img build (#41285)
f8ec172c
JJJYmmm Add processor and intergration test for qwen3vl (#41277)
a2de2937
Cyrilvallez Remove `test_initialization` (#41261)
27b9c795
ydshieh Remove some previous team members from allow list of triggering Githu…
0995a484
ydshieh Build doc in 2 jobs: `en` and `other languages` (#41290)
41eae7ad
Cyrilvallez Fix mxfp4 dequantization (#41292)
aca2380b
vasqu [`Flex Attn`] Fix lse x attention sinks logic (#41249)
531bb750
BenjaminBossan FIX: Bug in PEFT integration delete_adapter method (#41252)
cf88fbb6
fedtti Italian translation for README.md (#41269)
40329a8b
TKONIY Fix README.md error when installing from source (#41303)
e656e264
ydshieh download and use HF Hub Cache (#41181)
a6e9ec43
ArthurZucker ArthurZucker changed the base branch from main to v4 176 days ago
ArthurZucker fix some merge issues
010896e8
ArthurZucker [test_all]
8270a0f8
ArthurZucker [test-all]
e6d80873
ArthurZucker
github-actions
ArthurZucker
github-actions
LysandreJik
LysandreJik approved these changes on 2025-10-03
LysandreJik LysandreJik marked this pull request as ready for review 175 days ago
LysandreJik LysandreJik merged 2ccc6cae into v4 175 days ago
LysandreJik LysandreJik deleted the v4-backup branch 175 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone