Add Top-H decoding (entropy-bounded truncation) as a LogitsWarper for text generation (#40837)
* init
* added TopH
* Update TopH logits_process.py
* Update logits_process.py
* Update test_logits_process.py
* Update test_logits_process.py
* added test No. 4
* Resolving __init__.py issues
* Resolving configuration_utils.py Issues
* Resolving logits_process.py Issues
* Resolving utils.py Issues
* Resolving test_logits_process.py Issues
* Resolving __init__.py issues
* Resolving logits_process.py Issues
* Resolving __init__.py issues
* Updated Docs
* Updated Docstring
* style: autoformat with make fixup
* Fixing Docstring
* Update logits_process.py removed defaults
* Variable H name -> cumulative_entropy
* Using torch.distributions.Categorical
* Improve torch_dtype checks (#40808)
* Improve torch_dtype checks
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Apply suggestions from code review
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Add VideoProcessors to auto-backend requirements (#40843)
* add it
* fix existing ones
* add perception to auto_mapping...
* Adds Causal Conv 1D kernel for mamba models (#40765)
* add kernel
* make style
* keep causal-conv1d
* small fix
* small fix
* fix modular converter
* modular fix + lazy loading
* revert changes modular
* nit
* hub kernels update
* update
* small nit
* Update no split modules in T5Gemma model (#40810)
* Update no split modules in T5Gemma model
* Update no_split_modules also for T5Gemma modular
* Remove model_split_percents from test cases
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Replace image classification loss functions to `self.loss_function` (#40764)
* Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)
* align torch implementation of gdn with fla.
* fix fla import.
* fix
* remove unused attr
* fixes
* strictly align l2norm in Qwen3-Next with FLA implementation.
---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Fixes for continuous batching (#40828)
* Fix for CB attn mask and refactor
* Tests for CB (not all passing)
* Passing tests and a logger fix
* Fixed the KV metrics that were broken when we moved to hybrid alloc
* Fix circular import and style
* Added tests for FA
* Unfolded test to have device expectations
* Fixes for H100
* more fixes for h100
* H100 are good
* Style
* Adding some comments from #40831
* Rename test
* Avoid 1 letter variables
* Dictonnary is only removed during kwargs
* Test for supported sample
* Fix a unvoluntary slice
* Fixes for non-sliced inputs and small example improvments
* Slice inputs is more understandabe
* Style
* [tests] re-enable aria fast tests (#40846)
* rise from the dead
* test
* [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)
* Fix inconsistencies with box input inference with original repo
* remove print
* always pad
* fix modular
* [Sam2Video] Fix video inference with batched boxes and add test (#40797)
fix video inference with batched boxes and add test
* add: differential privacy research model (#40851)
* VaultGemma
* Removing Sequence and Token classification models. Removing integration tests for now
* Remove pass-only modular code. style fixes
* Update vaultgemma.md
* Update docs/source/en/model_doc/vaultgemma.md
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Update docs/source/en/model_doc/vaultgemma.md
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Add links to model doc
* Correct model doc usage examples
* Updating model doc to describe differences from Gemma 2
* Update model_doc links
* Adding integration tests
* style fixes
* repo consistency
* attribute exception
---------
Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)
* ouput_attentions in typed kwargs
* correct typing in GenericForTokenClassification
* improve
* [tests] move generative tests away from `test_modeling_common.py` (#40854)
move tests
* [generate] Always use decoder config to init cache (#40772)
* mega derp
* fix
* always use the decoder
* Use checkpoint in auto_class_docstring (#40844)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)
Fix ParallelismConfig type for accelerate < 1.10.1
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Redirect MI355 CI results to dummy dataset (#40862)
* [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
* [docstrings / type hints] Update outdated annotations for `past_key_values` (#40803)
* some fixes
* nits
* indentation
* indentation
* a bunch of type hints
* bulk changes
* fix florence kwargs (#40826)
* fix: XIELU act parameters not being casted to correct dtype (#40812)
* Update model tags and integration references in bug report (#40881)
* [Qwen3 Next] Use numerically stable `rsqrt` (#40848)
use numerically stable inverse
* Adding Support for Qwen3-VL Series (#40795)
* add qwen3vl series
* make fixup
* fix import
* re-protect import
* fix it finally (need to merge main into the branch)
* skip processor test (need the checkpoint)
* oups typo
* simplify modular
* remove unecesary attr
* fix layer
* remove unused rope_deltas args
* reuse image def
* remove unnesesary imports
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* [`VaultGemma`] Update expectations in integration tests (#40855)
* fix tests
* style
* Fix modular consistency (#40883)
* reapply modular
* add missing one
* 🔴 Move variable output controls to `_prepare_generation_config ` (#40715)
* move checks to validate steps where possible
* fix csm and other models that override _sample
* ops dia you again
* opsie
* joao review
* Move variable output controls to `prepare_inputs_for_generation`
* fix a bunch of models
* back to basics
* final touches
* Clarify passing is_causal in sdpa_attention_paged_forward (#40838)
* Correctly pass is_causal in sdpa_attention_paged_forward
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Improve typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Add comment
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Improve comments
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Revert typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Use torch.expm1 and torch.log1p for better numerical results (#40860)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Add Fast PromptDepthAnything Processor (#40602)
* Test & import setup
* First version passing tests
* Ruff
* Dummy post processing
* Add numerical test
* Adjust
* Doc
* Ruff
* remove unused arg
* Refine interpolation method and push test script
* update bench
* Comments
* Update src/transformers/models/auto/image_processing_auto.py
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* Remove benchmrk script
* Update docstrings
* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* doc
* further process kwargs
* remove it
* remove
* Remove to dict
* remove crop middle
* Remove param specific handling
* Update testing logic
* remove ensure multiple of as kwargs
* fix formatting
* Remove none default and get image size
* Move stuff to _preprocess_image_like_inputs and refacto
* Clean
* ruff
* End of file & comments
* ruff again
* Padding fixed
* Remove comments to pass tests
* Remove prompt depth from kwargs
* Adjust output_size logic
* Docstring for preprocess
* auto_docstring for preprocess
* pass as an arg
* update test batched
* stack images
* remove prompt scale to meter
* return tensors back in preprocess
* remove copying of images
* Update behavior to match old processoer
* Fix batch size of tests
* fix test and fast
* Fix slow processor
* Put tests back to pytorch
* remove check and modify batched tests
* test do_pad + slow processor fix
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* Fix deta loading & dataclass (#40878)
* fix
* fix 2
* Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)
Remove dict branch of attention_mask
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)
* fix: manual edits
* Apply suggestions from code review
* Update docs/source/ko/model_doc/smolvlm.md
* Update docs/source/ko/model_doc/smolvlm.md
* Update docs/source/ko/model_doc/smolvlm.md
* Update docs/source/ko/model_doc/smolvlm.md
* Update docs/source/ko/_toctree.yml
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* 🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557)
* feat: manual translation
* docs: fix ko/_toctree.yml
* Apply suggestions from code review
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
* Update docs/source/ko/image_processors.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* [generate] remove docs of a feature that no longer exists (#40895)
* Make debugging failing tests (check and update expect output values) easier 🔥 (#40727)
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fixing the call to kernelize (#40628)
* fix
* style
* overload train and eval
* add getter and setter
* Fix getter regression (#40824)
* test things
* style
* move tests to a sane place
* Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902)
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* [cache] Merge static sliding and static chunked layer (#40893)
* merge
* get rid of tensors in get_mask_sizes!!
* remove branch
* add comment explanation
* re-add the class with deprecation cycle
* Harmonize CacheLayer names (#40892)
* unify naming
* style
* doc as well
* post rebase fix
* style
* style
* revert
* [cache] Only use scalars in `get_mask_sizes` (#40907)
* remove tensor ops
* style
* style
* Set seed for `Glm4vIntegrationTest` (#40905)
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add Olmo3 model (#40778)
* transformers add-new-model-like for Olmo3
* Implement modular Olmo3
* Update Olmo3 tests
* Copy Olmo2 weight converter to Olmo3
* Implement Olmo3 weight converter
* Fix code quality errors
* Remove unused import
* Address rope-related PR comments
* Update Olmo3 model doc with minimal details
* Fix Olmo3 rope test failure
* Fix 7B integration test
* remove dummy EncodingFast (#40864)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Improve module name handling for local custom code (#40809)
* Improve module name handling for local custom code
* Use `%lazy` in logging messages
* Revert "Use `%lazy` in logging messages"
This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.
* Add notes for sanitization rule in docstring
* Remove too many underscores
* Update src/transformers/dynamic_module_utils.py
* Update src/transformers/dynamic_module_utils.py
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Remove `runner_map` (#40880)
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* disable `test_fast_is_faster_than_slow` (#40909)
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* [gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791)
* gemma3vision compatible with assisted generation
* docstring
* BC
* docstring
* failing checks
* make fixup
* apply changes to modular
* misc fixes
* is_initialized
* fix poor rebase
* [generate] misc fixes (#40906)
misc fixes
* 🔴Make `center_crop` fast equivalent to slow (#40856)
make center_crop fast equivalent to slow
* Fix dtype in Paligemma (#40912)
* fix dtypes
* fix copies
* delete unused attr
* [Docs] Adding documentation of MXFP4 Quantization (#40885)
* adding mxfp4 quantization docs
* review suggestions
* Apply suggestions from code review
Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Processor load with multi-processing (#40786)
push
* [Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832)
* Remove unused arg
* deprecate
* revrt one change
* get set go
* version correction
* fix
* make style
* comment
* Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)
* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)
* chore: fix code formatting and linting issues
* refactor: move UMT5 GGUF test to quantization directory and clean up comments
* chore: trigger CI pipeline
* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.
* Add regression check to UMT5 encoder GGUF test
Verify encoder output against reference tensor values with appropriate tolerances for stability.
* Update tests/quantization/ggml/test_ggml.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update tests/quantization/ggml/test_ggml.py
remove comments
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* [torchao safetensors] renaming get_state_dict function (#40774)
renaming get_state_dict function
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Adding activation kernels (#40890)
* first commit
* add mode
* revert modeling
* add compile
* rm print
* Minor fix for #40727 (#40929)
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add support for Florence-2 training (#40914)
* Support training florence2
* update doc and testing model to florence-community
* fix florence-2 test, use head dim 16 instead of 8 for fa2
* skip test_sdpa_can_dispatch_on_flash
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add LongCat-Flash (#40730)
* working draft for LongCat
* BC changes to deepseek_v3 for modular
* format
* various modularities
* better tp plan
* better init
* minor changes
* make modular better
* clean up patterns
* Revert a couple of modular commits, because we won't convert in the end
* make things explicit.
* draft test
* toctree, tests and imports
* drop
* woops
* make better things
* update test
* update
* fixes
* style and CI
* convert stuff
* up
* ah, yes, that
* enable gen tests
* fix cache shape in test (sum of 2 things)
* fix tests
* comments
* re-Identitise
* minimize changes
* better defaults
* modular betterment
* fix configuration, add documentation
* fix init
* add integration tests
* add info
* simplify
* update slow tests
* fix
* style
* some additional long tests
* cpu-only long test
* fix last tests?
* urg
* cleaner tests why not
* fix
* improve slow tests, no skip
* style
* don't upcast
* one skip
* finally fix parallelism
* [DOC] Add missing dates in model cards (#40922)
add missing dates
* [models] remove unused `import torch.utils.checkpoint` (#40934)
* Intel CPU dockerfile (#40806)
* upload intel cpu dockerfile
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update cpu dockerfile
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update label name
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941)
* Fix trainer tests (#40823)
* fix liger
* fix
* more
* fix
* fix hp
* fix
---------
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
* Fix `Glm4vMoeIntegrationTest` (#40930)
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Raise error instead of warning when using meta device in from_pretrained (#40942)
* raise instead of warning
* add timm
* remove
* Consistent naming for images kwargs (#40834)
* use consistent naming for padding
* no validation on pad size
* add warnings
* fix
* fox copies
* another fix
* fix some tests
* fix more tests
* fix lasts tests
* fix copies
* better docstring
* delete print
* Remove nested import logic for torchvision (#40940)
* remove nested import logic for torchvision
* remove unnecessary protected imports
* remove unnecessarry protected import in modular (and modeling)
* fix wrongly remove protected imports
* Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947)
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Update expected values for some `test_speculative_generation` (#40949)
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Standardize audio embedding function name for audio multimodal models (#40919)
* Standardize audio embedding function name for audio multimodal models
* PR review
* Add FlexOlmo model (#40921)
* transformers add-new-model-like
* Add FlexOlmo implementation
* Update FlexOlmo docs
* Set default tokenization for flex olmo
* Update FlexOlmo tests
* Update attention comment
* Remove unneeded use of `sliding_window`
* Don't list dropout in eager_paged_attention_forward (#40924)
Remove dropout argument
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Update expected values for one more `test_speculative_generation` after #40949 (#40967)
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)
* fix(trainer): ensure final checkpoint is saved when resuming training
* add test
* make style && slight fix of test
* make style again
* move test code to test_trainer
* remove outdated test file
* Apply style fixes
---------
Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Add new model LFM2-VL (#40624)
* Add LFM2-VL support
* add tests
* linting, formatting, misc review changes
* add siglip2 to auto config and instantiate it in lfm2-vl configuration
* decouple image processor from processor
* remove torch import from configuration
* replace | with Optional
* remove layer truncation from modeling file
* fix copies
* update everything
* fix test case to use tiny model
* update the test cases
* fix finally the image processor and add slow tests
* fixup
* typo in docs
* fix tests
* the doc name uses underscore
* address comments from Yoni
* delete tests and unsuffling
* relative import
* do we really handle imports better now?
* fix test
* slow tests
* found a bug in ordering + slow tests
* fix copies
* dont run compile test
---------
Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
* Fix outdated version checks of accelerator (#40969)
* Fix outdated version checks of accelerator
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix outdated version checks of accelerator
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)
use skip_predictor in vjepa2 `get_vision_features`
* [Trainer] Fix DP loss (#40799)
* fix
* style
* Fix fp16
* style
---------
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)
* fix(timm): Add exception handling for unknown Gemma3n model
* nit: Let’s cater to this specific issue
* nit: Simplify error handling
* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)
* fix merge conflicts
* change token typing
---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
* [tests] Really use small models in all fast tests (#40945)
* start
* xcodec
* chameleon
* start
* layoutlm2
* layoutlm
* remove skip
* oups
* timm_wrapper
* add default
* doc
* consistency
* Add captured actual outputs to CI artifacts (#40965)
* fix
* fix
* Remove `# TODO: ???` as it make me `???`
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Revert change in `compile_friendly_resize` (#40645)
fix
* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Using torch.distributions.Categorical
* Remove `set_model_tester_for_less_flaky_tests` (#40982)
remove
* Benchmarking v2 GH workflows (#40716)
* WIP benchmark v2 workflow
* Container was missing
* Change to sandbox branch name
* Wrong place for image name
* Variable declarations
* Remove references to file logging
* Remove unnecessary step
* Fix deps install
* Syntax
* Add workdir
* Add upload feature
* typo
* No need for hf_transfer
* Pass in runner
* Runner config
* Runner config
* Runner config
* Runner config
* Runner config
* mi325 caller
* Name workflow runs properly
* Copy-paste error
* Add final repo IDs and schedule
* Review comments
* Remove wf params
* Remove parametrization from worfkflow files
* Fix callers
* Change push trigger to pull_request + label
* Add back schedule event
* Push to the same dataset
* Simplify parameter description
* 🔴[`Attention`] Bert-based Models Attention Refactor (#38301)
* clean start to bert refactor
* some test fixes
* style
* fix last tests
* be strict on positional embeddings, fixup according tests
* cache support
* more cache fixes, new causal API
* simplify masks, fix tests for gen
* flex attn, static cache support, round of fixes
* ?
* this time
* style
* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)
* roberta
* fixup sdpa remains
* attention split, simplify args and kwargs, better typing
* fix encoder decoder
* fix test
* modular roberta
* albert
* data2vectext, making it modular tomorrow
* modular data2vec text
* tmp disable
* xmod + cache position fixes
* whoops
* electra + markuplm, small fixes
* remove wrong copy
* xlm_roberta + some embedding fixes
* roberta prelayernorm
* RemBert: remove copy, maybe doing it later
* ernie
* fix roberta offloading
* camembert
* copy fixes
* bert generation + fixes on eager
* xlm roberta xl
* bridgetower (text) + seamlessv2 copy fixes
* rocbert + small fixes
* whoops
* small round of fixups
* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)
* the end of the tunnel?
* fixup nllbmoe + style
* we dont need this anymore
* megatron bert is barely used, low prio skip for now
* Modernize bert (template for others)
NOTE: trying to push this through, might be overdue if not in time possible
* check inputs for all others (if checkmarked)
* fix bridgetower
* style
* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)
* proper fix for bert to force intermediate dict outputs
* propagate to others
* style
* xlm roberta xl investigation, its the layernorm...
* mobile bert
* revert this, might cause issues with composed models
* review
* style
* Remove [[autodoc]] refs to TF/Flax objects (#40996)
* remove refs
* more
* ENH: Enable readline support for transformers chat (#40911)
ENH Enable readline support for chat
This small change enables GNU readline support for the transformers chat
command. This includes, among others:
- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l
Implementation
Although it may look strange, just importing readline is enough to
enable it in Python, see:
https://docs.python.org/3/library/functions.html#input
As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.
Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).
* [testing] test `num_hidden_layers` being small in model tester (#40992)
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* blt wip (#38579)
* blt wip
* cpu version
* cpu friendly with full entropy model (real time patching)
* adding config file instead of args file
* enable MPS
* refactoring unused code
* single config class in config file
* inherit from PreTrainedModel
* refactor LMTransformer --> BLTPatcher
* add conversion script
* load from new checkpoing with form_pretrained
* fixed demo from_pretrained
* clean up
* clean a few comments
* cleanup folder
* clean up dir
* cleaned up modeling further
* rename classes
* adding transformers Attention class and RotaryEmbedding class
* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc
* seperate out patcher config, update modeling and conversion script
* rename vars to be more transformers-like
* rm unused functions
* adding cross attention from transformers
* pass arg
* rename weights
* updated conversion script
* overwritten commit! fixing PR
* apply feedback
* adding BLTRMSNorm like Llama
* add repeat_kv and eager_attention_forward copied from
* BLTMLP identical to MllamTextMLP
* clean up some args'
* more like mllama, but busier inits
* BLTTransformerLayer config
* decoder, encoder, global configs
* wip working on modular file
* cleaning up patch and configs
* clean up patcher helpers
* clean up patcher helpers further
* clean up
* some config renaming
* clean up unused configs
* clean up configs
* clean up configs
* update modular
* clean
* update demo
* config more like mllama, seperated subconfigs from subdicts
* read from config instead of self args
* update demo file
* model weights to causal lm weights
* missed file
* added tied weights keys
* BLTForCausalLM
* adding files after add-new-model-like
* update demo
* working on tests
* first running integration tests
* added integration tests
* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff
* tokenizer clean up
* modular file
* fixing rebase
* ruff
* adding correct basemodel output and updating config with checkpoint vals (for testing)
* BLTModelTests git status
* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic
* fix sdpa == causal tests
* fix small model test and some gradient checkpointing
* skip training GC tests
* fix test
* updated modular
* update modular
* ruff
* adding modular + modeling
* modular
* more modern is_casual check
* cleaning up modular
* more modular reduction
* ruff
* modular fix
* fix styling
* return 2
* return 2
* fix some tests
* fix bltcrossattention after modular break
* some fixes / feedback
* try cache generate fix
* try cache generate fix
* fix generate tests
* attn_impl workaround
* refactoring to use recent TransformersKwargs changes
* fix hidden_states shape test
* refactor to new outputs
* simplify outputs a bit
* rm unneeded decoderlayer overwriting
* rename blt
* forgot tokenizer test renamed
* Reorder
* Reorder
* working on modular
* updates from modular
* new modular
* ruff and such
* update pretrainedmodel modular
* using cohere2 apply_rotary_pos_emb
* small changes
* apply feedback r2
* fix cross_attention
* apply more feedback
* update modeling fix
* load submodules from pretrainedmodel
* set initializer_range to subconfigs
* rm cross_attnetion_states pass when not needed
* add 7b projection layer support
* check repo
* make copies
* lost cohere2 rotate_half
* ruff
* copies?
* don't tie weights for submodules
* tie weights setting
* check docstrings
* apply feedback
* rebase
* rebased modeling
* update docs
* applying feedback
* few more fixes
* fix can_record_outputs
* fast tokenizer
* no more modulelist
* tok auto
* rm tokenizersss
* fix docs
* ruff
* fix after rebase
* fix test, configs are not subscriptable
---------
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
* [docs] rm stray tf/flax autodocs references (#40999)
rm tf references
* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)
* fix
* fixup inits
* oops
* fixup gemma
* fixup modular order
* how does this keep happen lol
* vaultgemma is new i forgot
* remove init check
* Make `EfficientLoFTRModelTest` faster (#41000)
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix typoes in src and tests (#40845)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)
* Fix model cards and modalities in toctree
* fix new models
* RUFF fix on CI scripts (#40805)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* fix dict like init for ModelOutput (#41002)
* fix dict like init
* style
* 🚨 [v5] remove generate output retrocompatibility aliases (#40998)
remove old type aliases
* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)
* update test (and overwrites)
* better test comment
* 0 as a default for
* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* 🚨 [v5] remove deprecated entry point (#40997)
* remove old entry point
* update references to transformers-cli
* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)
* fix: bug that made early stop change order of matches
* fix: applied code suggestion
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix: applied code suggestion to modular
* fix: integration tests
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Fix `PhimoeIntegrationTest` (#41007)
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix Glm4v test (#41011)
fix
* Update after #41007 (#41014)
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix benchmark runner argument name (#41012)
* Adding support for Qwen3Omni (#41025)
* Add Qwen3Omni
* make fix-copies, import properly
* nit
* fix wrong setup. Why was audio_token_id renamed ?
* upds
* more processing fixes
* yup
* fix more generation tests
* down to 1?
* fix import issue
* style, update check repo
* up
* fix quality at my best
* final quality?
* fix doc building
* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE
* SKIP THE TEMPLATE ONE
---------
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
* Making compute_loss_func always take priority in Trainer (#40632)
* logger warn, if-else logic improved
* redundant if condition fix
* Modify Qwen3Omni parameter name since VL changed it (#41045)
Modify parameter name since VL changed it
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
* Fix Qwen video tests (#41049)
fix test
* [testing] Fix `qwen2_audio` (#41018)
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix typing of tuples (#41028)
* Fix tuple typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* More fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* More fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Remove optax (#41030)
Remove optax dep
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix typos in English/Chinese documentation (#41031)
* Fix typos and formatting in English docs
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix typos and formatting in Chinese docs
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Use torch.autocast (#40975)
* Use torch.autocast
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Format code
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* docs: improved RoPE function Docstrings (#41004)
* docs: improved RoPE functuon docstrings
* Update src/transformers/modeling_rope_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Fix condition for emitting warning when generation exceeds max model length (#40775)
correct warning when generation exceeds max model length
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
* Fix outdated torch version check (#40925)
Update torch minimum version check to 2.2
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Remove doc of tf and flax (#41029)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)
* Add whole word masking
* Vectorize whole word masking functions
* Unit test whole word masking
* Remove support for TF in whole word masking
* [testing] Fix `seed_oss` (#41052)
* fix
* fix
* fix
* fix
* fix
* fix
* Update tests/models/seed_oss/test_modeling_seed_oss.py
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Remove repeated import (#40937)
* Remove repeated import
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix conflict
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Simplify unnecessary Optional typing (#40839)
Remove Optional
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Add write token for uploading benchmark results to the Hub (#41047)
* Separate write token for Hub upload
* Address review comments
* Address review comments
* Ci utils (#40978)
* Add CI reports dir to gitignore
* Add utils to run local CI
* Review compliance
* Style
* License
* Remove <frameworkcontent> and <pt> tags from documentation (#41055)
* Remove <frameworkcontent> and <pt> tags
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Revert changes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Update docs/source/en/model_doc/madlad-400.md
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Fix CI jobs being all red 🔴 (false positive) (#41059)
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Update quantization CI (#41068)
* fix
* new everything
* fix
* [i18n-bn] Add Bengali language README file (#40935)
* [i18n-bn] Add Bengali language README file and update links in existing language files
* Update Bengali README for clarity and consistency in model descriptions
* Improve documentation and errors in Mamba2-based models (#41063)
* fix bug in Mamba2 docs
* correct 'because on of' issue
* link to other Mamba2 model types
* github URL is not changed
* update error message in generated files
* Update team member list for some CI workflows (#41094)
* update list
* update list
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix crash when using chat to send 2+ request to gptoss (#40536)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* Minor addition, no split modules for VideoMAEE (#41051)
* added no split modules
* fixed typo
---------
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
* Switch to `python:3.10-slim` for CircleCI docker images (#41067)
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix argument name in benchmarking script (#41086)
* Fix argument name in benchmarking script
* Adjust vars
* Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)
Remove mention of TensorFlow from English documentation
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix typos in documentation (#41087)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix typing (#40788)
* Fix optional typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix optional typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix schema typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix typing
* Fix typing
* Fix typing
* Fix typing
* Use np.ndarray
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Format code
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Use np.ndarray
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Improve typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix quote string of np.ndarray
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* More fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix code
* Format
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Remove unused arguments (#40916)
* Fix unused arguments
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* More fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Remove tf and flax from Chinese documentation (#41057)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* fix wrong height and width when read video use torchvision (#41091)
* docs: Fix Tool Use links and remove dead RAG links (#41104)
docs: Fix tool use links. Remove dead RAG links. Fix style
* 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)
* tmp
* fix modular inheritance
* nit
* paligemma 1 doesn't have swa
* use same pattern as in models with hybrid layers
* PR comments
* helium also needs layer_typed (bc it relies on gemma)
* paligemma/gemma3: same mask creation fn in fwd and generate
* propagate changes to helium (gemma-based)
* tmp commit
* slow paligemma tests passing, let's see what breaks
* fix test_left_padding_compatibility
* tmp commit
* tmp commit
* rebase error
* docs
* reduce diff
* like this?
* t5gemma
* better comment
* shorter diff
* exception
* ffs type
* optional
* shorter modular_gemma.py
* helium model actually needs no changes -- the tester is the issue
* t5gemma modular config
* a few more modular; paligemma BC
* fix processor issues?
* rm config exception
* lift warning in gemma
* [tests] gpt2 + `CausalLMModelTester` (#41003)
* tmp commit
* tmp commit
* tmp commit
* rm old GPT2ModelTester
* nit bug
* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns
* vision_encoder_decoder
* Fix `_get_test_info` for inherited tests (#41106)
* fix _get_test_info
* fix patched
* add comment
* ruff
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Remove bad test skips (#41109)
* remove bad skips
* remove more
* fix inits
* Format empty lines and white space in markdown files. (#41100)
* Remove additional white space and empty lines from markdown files
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Add empty lines around code
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)
Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* 🚨 [V5] Remove deprecated training arguments (#41017)
* Remove deprecated training arguments from V5
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Remove deprecated training arguments from V5
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix comments
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix code
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Support loading LFM2 GGUF (#41111)
* add gguf config mapping for lfm2
* add lfm2 tensor process to unsqueeze conv weights
* adjust values from gguf config to HF config
* add test for lfm2 gguf
* ruff
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* [torchao safetensors] integrate torchao safetensors support with transformers (#40735)
* enable torchao safetensors
* enable torchao safetensors support
* add more version checking
* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)
* fix mismatched dims for qwen3 next
* propagate changes
* chore: renamed tot_heads to total_sequence_length
* Apply suggestion from @vasqu
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* minor fix to modular qwen3 next file
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Fix the error where a keyword argument appearing before *args (#41099)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix broken `` expressions in markdown files (#41113)
Fix broken expressions in markdown files
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Remove self-assignment (#41062)
* Remove self-assignment
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Update src/transformers/integrations/flash_paged.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Clear pass
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Clear pass
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Clear pass
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)
* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning
* docs(text2text_generation): 更新参数注释以反映现代生成实践
将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法
* refactor(text2text_generation): Remove outdated input validation logic
* docs(text2text_generation): Revert incorrectly modified comment
* docs(text2text_generation): Revert incorrectly modified comment
* Fixed MXFP4 model storage issue (#41118)
* Fixed loading LongT5 from legacy checkpoints (#40724)
* Fixed loading LongT5 from legacy checkpoints
* Adapted the fix to work with missing lm_head
* dummy commit (#41133)
* dummy commit, nothing interesting
* dummy commit, nothing interesting
* dummy commit, nothing interesting
* dummy commit, nothing interesting
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix loading logic flaw with regards to unexpected and missing keys (#40850)
* Unexpected keys should be ignored at load with device map
* remove them all
* fix logic flaw
* fix
* simplify
* style
* fix
* revert caching allocator change
* add other test
* add nice doc
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Using torch.distributions.Categorical
* Resolving logits_process.py Issues
* style: autoformat with make fixup
* Update logits_process.py removed defaults
* Variable H name -> cumulative_entropy
* Resolving format error
* Correction of the loop variables in logit processor
* Vectorized the loop in logits_process
* formatted logits_process
* paper reference and stopping rule comment logits_process
* Trigger CI rerun
* Update logits_process.py
* added test_TopH_example_integration
* added test_TopH_example_integration
* Update README.md
* Restore CI config to match main (remove accidental changes)
* Restore CI config to match upstream main (no diffs)
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: ArminAzizi98 <147081650+ArminAzizi98@users.noreply.github.com>
Co-authored-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Yuchao Zhang <418121364@qq.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Bo Zheng <368586905@qq.com>
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Ákos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: 艾力可 <178652170+thalahors@users.noreply.github.com>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
Co-authored-by: Samuel Barry <127697809+SamuelBarryCS@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Shane A <shanea@allenai.org>
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Yaswanth Gali <82788246+yaswanth19@users.noreply.github.com>
Co-authored-by: Akshay Babbar <priv.akshay@outlook.com>
Co-authored-by: liangel-02 <liangel@meta.com>
Co-authored-by: Duc-Viet Hoang <vietyb00@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: lilin-1 <256404019@qq.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Jack <32371937+jackzhxng@users.noreply.github.com>
Co-authored-by: Rangehow <88258534+rangehow@users.noreply.github.com>
Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
Co-authored-by: Hamish Scott <41787553+hamishs@users.noreply.github.com>
Co-authored-by: Harshal Janjani <75426551+harshaljanjani@users.noreply.github.com>
Co-authored-by: Branden <brandenkmurray@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
Co-authored-by: StevenBucaille <steven.bucaille@gmail.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
Co-authored-by: Ayush <ayushtanwar1729@gmail.com>
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: Ralph Gleaton <70818603+rjgleaton@users.noreply.github.com>
Co-authored-by: Saidur Rahman Pulok <59414463+saidurpulok@users.noreply.github.com>
Co-authored-by: Nick Doiron <ndoiron@mapmeld.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Duygu Altinok <duygu.altinok12@gmail.com>
Co-authored-by: Jinde.Song <juude.song@gmail.com>
Co-authored-by: hbenoit <60629420+HaroldBenoit@users.noreply.github.com>
Co-authored-by: nnul <107971634+notkisk@users.noreply.github.com>
Co-authored-by: YangKai0616 <kai.yang@intel.com>
Co-authored-by: Karol Szustakowski <61427290+Szustarol@users.noreply.github.com>
Co-authored-by: souvikku <107592858+souvikku@users.noreply.github.com>