PR #41310 v4.57.0 Branch

Update expected values for one more `test_speculative_generation` aft…

5748352c

FIX(trainer): ensure final checkpoint is saved when resuming training…

564fde14

Add new model LFM2-VL (#40624)

c5325757

Fix outdated version checks of accelerator (#40969)

f6104189

Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)

7cf1f5ce

[Trainer] Fix DP loss (#40799)

9378f874

[timm_wrapper] better handling of "Unknown model" exception in timm (…

6e51ac31

Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate to…

2ce35a24

[tests] Really use small models in all fast tests (#40945)

dd7ac4cd

Add captured actual outputs to CI artifacts (#40965)

738b223f

Revert change in `compile_friendly_resize` (#40645)

d9d7f6a6

Track the CI (model) jobs that don't produce test output files (proce…

5ac3c517

Remove `set_model_tester_for_less_flaky_tests` (#40982)

5c2f5663

Benchmarking v2 GH workflows (#40716)

47c1a1b4

ENH: Enable readline support for transformers chat (#40911)

5a246131

[testing] test `num_hidden_layers` being small in model tester (#40992)

103fe0d5

blt wip (#38579)

a5ffae62

[`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)

78f3e087

Make `EfficientLoFTRModelTest` faster (#41000)

a89ed714

Fix typoes in src and tests (#40845)

662ea950

Fix more dates in model cards and wrong modalities in _toctree.yml (#…

f73f73d4

RUFF fix on CI scripts (#40805)

6e1270d2

fix dict like init for ModelOutput (#41002)

251825aa

[tests] update `test_left_padding_compatibility` (and minimize overwr…

f47c6514

Patch more `unittest.case.TestCase.assertXXX` methods (#41008)

b164209d

🚨 [lightglue] fix: matches order changed because of early stopped ind…

d6d2d03b

Fix `PhimoeIntegrationTest` (#41007)

b2b50448

Fix Glm4v test (#41011)

e5a9a1de

Update after #41007 (#41014)

9de898e5

Fix benchmark runner argument name (#41012)

c1cf8dee

Adding support for Qwen3Omni (#41025)

41813d32

Making compute_loss_func always take priority in Trainer (#40632)

71f768bc

Modify Qwen3Omni parameter name since VL changed it (#41045)

23d0c62a

Fix Qwen video tests (#41049)

f1a8aff9

[testing] Fix `qwen2_audio` (#41018)

c6d3d0b9

Fix typing of tuples (#41028)

30dadfd5

Remove optax (#41030)

c931992d

Fix typos in English/Chinese documentation (#41031)

84600532

Use torch.autocast (#40975)

e6f5f948

docs: improved RoPE function Docstrings (#41004)

1ca91812

Fix condition for emitting warning when generation exceeds max model …

7425f6dc

Fix outdated torch version check (#40925)

9b221a84

Add Whole Word Masking and Padding Strategy to DataCollatorForLanguag…

c2c9074b

[testing] Fix `seed_oss` (#41052)

5fb3b354

Remove repeated import (#40937)

36911028

Simplify unnecessary Optional typing (#40839)

d43b73cb

Add write token for uploading benchmark results to the Hub (#41047)

9de77d70

Ci utils (#40978)

98e87dbf

Fix CI jobs being all red 🔴 (false positive) (#41059)

bdbe9878

Update quantization CI (#41068)

abbf0edd

[i18n-bn] Add Bengali language README file (#40935)

a9266c98

Improve documentation and errors in Mamba2-based models (#41063)

ed8d3aaa

Update team member list for some CI workflows (#41094)

fc974a97

fix crash when using chat to send 2+ request to gptoss (#40536)

dca053d1

Minor addition, no split modules for VideoMAEE (#41051)

ea92b1a0

Switch to `python:3.10-slim` for CircleCI docker images (#41067)

722be9f5

Fix argument name in benchmarking script (#41086)

e140ee3c

Fix typos in documentation (#41087)

9957b448

Fix typing (#40788)

281b8b62

Remove unused arguments (#40916)

72e7f343

fix wrong height and width when read video use torchvision (#41091)

93655f31

docs: Fix Tool Use links and remove dead RAG links (#41104)

c42b27b9

[tests] gpt2 + `CausalLMModelTester` (#41003)

9d9177f4

Fix `_get_test_info` for inherited tests (#41106)

8291a7fc

Remove bad test skips (#41109)

7bf0c7d3

Format empty lines and white space in markdown files. (#41100)

1f7c6535

Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)

a5a88829

Support loading LFM2 GGUF (#41111)

38c30bba

[torchao safetensors] integrate torchao safetensors support with tran…

f212a0b4

[Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule a…

957b5568

Fix the error where a keyword argument appearing before *args (#41099)

7fde9757

Fix broken `` expressions in markdown files (#41113)

c6f31abf

Remove self-assignment (#41062)

48c8c8db

Fixed MXFP4 model storage issue (#41118)

25c8ac57

Fixed loading LongT5 from legacy checkpoints (#40724)

0bc795f8

dummy commit (#41133)

99630b85

Fix loading logic flaw with regards to unexpected and missing keys (#…

6e913fc9

Fix: align Qwen2.5-VL inference rope index with training by passing s…

477b7a3a

Fix single quotes in markdown (#41154)

287652a2

extend gemma3n integration ut cases on XPU (#41071)

174a5c4e

Add Parakeet (#39062)

53ce2f82

Fix format of compressed_tensors.md (#41155)

bd77d70e

Simplify and improve model loading logic (#41103)

6566998e

Force new vision models addition to include a fast image processor (#…

83fc0ee9

Add language specifiers to code blocks of markdown files (#41114)

d3d02925

Improve `add_dates` script (#41167)

46a138cd

Fix flash-attn for paged_attention when no kernels (#41078)

2e498266

Remove data from examples (#41168)

ac8703da

Enable fa in amd docker (#41069)

14b45582

handle flash slow tests (#41072)

d8152615

Modernbert fix (#41056)

cd154ae9

CI Runners - move amd runners mi355 and 325 to runner group (#41193)

56a74c39

[XPU] Add MXFP4 support for XPU (#41117)

9a76ebfa

[tests] `CausalLMTester` automatically infers other test classes from…

97ee50f3

More typing fixes (#41102)

deac4530

enable flex attention ut cases on XPU (#40989)

9b7c343f

fix(trainer): Avoid moving model with device_map (#41032)

469336de

Fix attention sink implementation in flex attention (#41083)

01d8cc0e

Separate docker images for Nvidia and AMD in benchmarking (#41119)

fae2d679

Make quantizers good citizens loading-wise (#41138)

3017f04b

[`Kernels Attention`] Change fallback logic to error out on explicit …

389115c1

Add EdgeTAM (#39800)

ec368a27

Fix EXAONE-4.0 dummy id (#41089)

068e7091

Fix 8bit bnb loading (#41200)

a2b6ccff

Fix docker quantization (#41201)

a2cdcccc

Embed interactive timeline in docs (#41015)

be826baa

[docs] Fix links (#41110)

090ad5db

Remove unnecessary Optional typing (#41198)

e0973709

docs/examples(speech): pin CTC commands to Hub datasets; add Windows …

5c0fd100

Fix Qwen3-Omni audio_token_id serialization issue (#41192)

f588aa82

Wait for main process in _save_checkpoint to ensure best checkpoint e…

4c54a98a

Avoid assumption that model has config attribute in deepspeed (#41207)

7e698ed8

Trainer: Pass `num_items_in_batch` to `compute_loss` in `prediction_s…

4886248d

[ESM] add accepts_loss_kwargs=False to EsmPreTrainedModel (#41006)

d1fd30d9

Align pull request template to bug report template (#41220)

9f7da26d

[generate] cache missing custom generate file (#41216)

8a23f340

Remove old Python code (#41226)

99fbb874

Adapt to the SDPA interface to enable the NPU to call FlashAttentionS…

4c6f26e0

update code owners (#41221)

3b03f55b

Unify is_torchvision_v2_available with is_torchvision_available (#41227)

12c4e6a6

Fix typing of train_args (#41142)

b0427d66

Fix sliding window attn mask (#41228)

50907d32

Revert "Fix DeepSpeed mixed precision precedence over Accelerate defa…

7db62848

[docs] Fix tp_plan (#41205)

86982f21

Fix white space in documentation (#41157)

ac69a8f1

fix qwen text config (#41158)

186a357f

Video processor accepts single frames on cuda (#41218)

46954e49

Use math.log2 (#41241)

25e26411

fix TrainerIntegrationDeepSpeed UT failures (#41236)

7076da63

[repo utils] Update `models_to_deprecate.py` (#41231)

9e60961e

Use removeprefix and removesuffix (#41240)

7aca328f

Fix pylint warnings (#41222)

c6af1ca2

Remove all instances of `is_safetensors_available` (#41233)

b7757de7

FP-Quant NVFP4 and Python 3.9 support (#39876)

c7616fdf

[`FA3`] Fix masking and loading logic in same process (#41217)

f672ee02

[t5gemma] fix `get_text_config` and related fixes (#40939)

066ca8e4

Don't convert to `safetensors` on the fly if the call is from testing…

e49d3d6b

Resolve remote custom module path warnings (#41243)

9ab2d57a

add peft team members to issue/pr template (#41262)

a6f470f6

docs: update bitsandbytes platform support (#41266)

19826920

add more activation kernels, follow up (#40944)

9e34b40e

fix asr pipeline ut failures (#41275)

e80da3a4

Use regex defailed flags (#41264)

d8566bc6

Fix multi-video timestamp bug in Qwen-3-VL and GLM4V (#41229)

d88a0fbb

Fix binding of video frames to video placeholder in `InternVL` model …

54c026ea

Deprecate Trackio environment variables and deploy to Spaces by defau…

03d976d1

Allow private Space id for Trackio (#40948)

37f1f5d7

fix async client for transformers chat (#41255)

247d21ad

Unify is_torchvision_v2_available with is_torchvision_available (#41259)

26c57efa

Use max/min (#41280)

91e1bdd0

Biogptlogits (#41270)

4f1faa06

Fix unnecessary single-item container checks (#41279)

9d67585e

Fix pylint generator warnings (#41258)

89d53495

feat: use `aws-highcpu-32-priv` for amd docker img build (#41285)

f8ec172c

Add processor and intergration test for qwen3vl (#41277)

a2de2937

Remove `test_initialization` (#41261)

27b9c795

Remove some previous team members from allow list of triggering Githu…

0995a484

Build doc in 2 jobs: `en` and `other languages` (#41290)

41eae7ad

Fix mxfp4 dequantization (#41292)

aca2380b

[`Flex Attn`] Fix lse x attention sinks logic (#41249)

531bb750

FIX: Bug in PEFT integration delete_adapter method (#41252)

cf88fbb6

Italian translation for README.md (#41269)

40329a8b

Fix README.md error when installing from source (#41303)

e656e264

download and use HF Hub Cache (#41181)

a6e9ec43

ArthurZucker changed the base branch from main to v4 293 days ago

fix some merge issues

010896e8

[test_all]

8270a0f8

[test-all]

e6d80873

LysandreJik approved these changes on 2025-10-03

LysandreJik marked this pull request as ready for review 293 days ago

LysandreJik merged 2ccc6cae into v4 293 days ago

LysandreJik deleted the v4-backup branch 293 days ago

transformers
v4.57.0 Branch
#41310

Merged

v4.57.0 Branch #41310

transformers v4.57.0 Branch #41310 Merged

v4.57.0 Branch #41310

transformers
v4.57.0 Branch
#41310

Merged