transformers
[`BC`] Update `get_(text|image|audio|video)_features` methods to return `BaseModelOutputWithPooling`
#42564
Merged

[`BC`] Update `get_(text|image|audio|video)_features` methods to return `BaseModelOutputWithPooling` #42564

tomaarsen
tomaarsen Add return_dict to get_text_features methods to allow returning 'Base…
4c659771
tomaarsen Add return_dict to get_image_features methods to allow returning 'Bas…
47c2418b
tomaarsen make fixup
b6d6df3b
HuggingFaceDocBuilderDev
zucchini-nlp
zucchini-nlp commented on 2025-12-03
tomaarsen Ignore discrepancies for pooler_output, focus on last_hidden_state
aa514197
tomaarsen Update get_image_features for the missing architectures
278b0686
tomaarsen Update all get_audio_features
3b140453
tomaarsen Update get_video_features, except instructblipvideo
b7e0d66d
tomaarsen Merge branch 'main' into feat/normalize_get_features_methods
41bcca84
tomaarsen Run ruff formatting
7eb89b61
tomaarsen Patch Glm4v VisionModel forward with BaseModelOutputWithPooling
57af63d3
tomaarsen Patch instructblip, although backwards incompatibility stands
7285187c
tomaarsen Patch Kosmos2 and Ovis2
fd7be527
tomaarsen Reformat Ovis2
3f183fd4
tomaarsen Avoid now-deprecated return_attentions
391aac93
tomaarsen
zucchini-nlp
zucchini-nlp commented on 2025-12-15
tomaarsen Remove NumFrames
f8c887ff
tomaarsen Proposal to simplify get_..._features via TransformersKwargs & check_…
9a251ce0
tomaarsen
tomaarsen Revert check_model_inputs, adopt can_return_tuple, accept BC on get_.…
858d9d42
tomaarsen Fix typo: can_return_dict -> can_return_tuple
2a643038
tomaarsen Adopt can_return_tuple for many get_image_features
fc8ee939
tomaarsen Update all get_audio_features, some edge cases handled (e.g. gemma3n)
00aa0f5d
tomaarsen Update most get_video_features, some edge case remain, e.g. instruct…
1ccbf5a3
tomaarsen
tomaarsen Patch Fuyu, just return BaseModelOutputWithPooling without pooler
78fa904f
tomaarsen Introduce ModelOutput subclass for Chameleon, patch get_image_features
f082a8e8
tomaarsen
tomaarsen Update modeling files with new output formats for get_..._features
9ddd3b43
tomaarsen Update fast_vlm modeling forward from modular llava to remove image_s…
006b2a54
tomaarsen Merge branch 'main' into feat/normalize_get_features_methods
afd5e64e
tomaarsen Update colqwen2 its self.vlm.model.visual call to expect BaseModelOutput
1d6639b7
tomaarsen Replace prior return_dict with check_model_inputs on qwen2_5_vl its V…
d52def37
tomaarsen Use BaseModelOutputWithProjectionAttentions for Kosmos2 to allow retu…
ff676635
tomaarsen Update Emu akin to Chameleon
22522c45
tomaarsen Update the blip architectures with a naive fix
37a53c38
tomaarsen Convert remaining modulars (emu3, janus), patch emu3
440914b6
tomaarsen Merge branch 'main' into feat/normalize_get_features_methods
b6dbddd4
tomaarsen Patch blip test
48353a54
tomaarsen Update deepseek_vl using a new BaseModelOutputWithHighResVisionEncodings
531321c8
tomaarsen Remove 'copied' for blip_2, instructblip and kosmos2 as they required…
70577d2b
tomaarsen Patch qwen3_vl and qwen3_vl_moe, where I used last_hidden_state inste…
f6f90d67
tomaarsen Run repo-consistency
7af0b665
tomaarsen tomaarsen marked this pull request as ready for review 37 days ago
tomaarsen tomaarsen requested a review from zucchini-nlp zucchini-nlp 37 days ago
tomaarsen
tomaarsen commented on 2025-12-22
zucchini-nlp
zucchini-nlp commented on 2025-12-22
tomaarsen Merge branch 'main' into feat/normalize_get_features_methods
8db6370b
tomaarsen Use kwargs["output_hidden_states"] = True to hardcode output_hidden_s…
cbe007b6
tomaarsen Update new GlmAsr get_audio_features on ForConditionalGeneration
7c34c6ec
tomaarsen Run make style
d9edd994
tomaarsen Try to add _can_record_outputs to florence2
763ddf69
tomaarsen Override JanusVisionModel.forward to avoid bad q-former copy from Blip2
84206403
tomaarsen Import missing BaseModelOutput
e0ea3003
tomaarsen Pop deprecated 'return_attentions', setting 'return_dict' won't be us…
78bd0d01
tomaarsen Reintroduce kwargs filtering in llava etc. for safety re. image_sizes
d348d935
tomaarsen Use BaseModelOutputWithPooling superclass consistently for custom get…
71ea85a2
tomaarsen tomaarsen changed the title [`draft`] Add `return_dict` to `get_(text|image|audio|video)_features` methods [`draft`] Update `get_(text|image|audio|video)_features` methods to return `BaseModelOutputWithPooling` 17 days ago
tomaarsen Update Blip-2 family and its BaseModelOutputWithVisionQformerOutputs
8c59e951
tomaarsen Merge branch 'main' into feat/normalize_get_features_methods
3fff252a
tomaarsen tomaarsen changed the title [`draft`] Update `get_(text|image|audio|video)_features` methods to return `BaseModelOutputWithPooling` [`BC`] Update `get_(text|image|audio|video)_features` methods to return `BaseModelOutputWithPooling` 17 days ago
tomaarsen Update glm4v _can_record_outputs
3f4c34bb
tomaarsen Remove check_model_inputs in granite_speech
b39b6d1c
tomaarsen Run make style
af0ccb10
tomaarsen Add _can_record_outputs to Ovis2VisionModel
f8e08d97
tomaarsen Update get_text_features/get_video_features from pe_video
2d747d94
tomaarsen Update missing case on sam3
008e15d3
tomaarsen Update get_text_features type hints to Union[tuple, BaseModelOutputWi…
e92efb9c
tomaarsen Add _can_record_inputs to qwen2_5_omni and qwen2_5_vl
b06a2d2e
tomaarsen Update get_image_features and get_video_features on ernie4_5_vl_moe
4a573afc
tomaarsen Update get_image_features type hints to Union[tuple, BaseModelOutputW…
2c677f9d
tomaarsen Remove @auto_docstring from pe_video, it's seemingly not used on that…
1a8d14be
tomaarsen Update get_video_features type hints to Union[tuple, BaseModelOutputW…
87d22d30
tomaarsen Fix pe_video import issue
8d5802e2
tomaarsen Update forward, test, and docstring for sam3
a9ff924b
tomaarsen Update get_audio_features type hints to Union[tuple, BaseModelOutputW…
8ad35e74
tomaarsen Add simple test case for get_text_features
7c99867a
tomaarsen First attempt to get get_image_features under test, still 26 failures
35feb85f
tomaarsen Resolve several test failures, progress still slow and inconsistent
a64634bd
tomaarsen Merge branch 'main' into feat/normalize_get_features_methods
b5b334f5
zucchini-nlp
zucchini-nlp approved these changes on 2026-01-12
tomaarsen Split up get_..._features tests more, should be simpler to disable/cu…
5ad8ca52
tomaarsen Fix emu3 tests, also track non-temporal ResNet in hidden_states
0284715e
tomaarsen Patch chameleon, emu3, ernie4_5, janus
be41c044
tomaarsen Skip output_attentions for FastVLM, timm doesn't accept it
27430538
tomaarsen Patch groupvit, instructblip, ovis2
76371d8c
tomaarsen Patch paddleocr_vl, qwen2_5_omni, qwen2_5_vl, qwen2_vl, and skip test…
88a5804f
tomaarsen Patch qwen3_omni_moe, sam family, edgetam
13875af6
tomaarsen Kill now unused BaseModelOutputWithFeatureMaps
e480bc0e
tomaarsen Remove left-over return_dict from prior attempt
2bd9a49a
tomaarsen
tomaarsen Allow for output_hidden_states in theory, but skip impossible tests
54550383
tomaarsen Introduce tests for get_audio_features, fixed all architectures
3f75c03e
tomaarsen Introduce tests for get_video_features, only ernie4_5_vl_moe is failing
5e7d821f
tomaarsen Call post_init on GraniteSpeechCTCEncoder, which was given a PreTrain…
1b8ab38b
tomaarsen Update llava_onevision test suite, only create video pixel_values in …
34677988
tomaarsen Create custom video input for ernie4_5_vl_moe
6f23bf5a
tomaarsen Skip CLIP family tests; they don't support output_hidden_states/outpu…
a8e5f920
tomaarsen Breaking: update Blip2Model.get_text_features to no longer output logits
508955e4
tomaarsen Satisfy test_num_layers_is_small test for align
df4d7512
tomaarsen Test against last_hidden_state against batch_size and hidden_size
1254b295
tomaarsen Skip last_hidden_state shape tests for unusual cases
c8b712f5
github-actions
tomaarsen Update docstrings via auto_docstring for all get_..._features methods
d6f0fb91
tomaarsen Ensure all auto_doc arguments are documented
51638d6c
tomaarsen Remove redundant docstrings
af3b70fc
tomaarsen Merge branch 'main' into feat/normalize_get_features_methods
4d522c7f
tomaarsen Also patch the new glm_image for get_image_features/output_hidden_states
35640452
tomaarsen Update modular files as per check_docstring rules ...
f7100d3a
tomaarsen Update glm-image dates via fix-repo
a41491fc
tomaarsen tomaarsen requested a review from ArthurZucker ArthurZucker 9 days ago
tomaarsen tomaarsen requested a review from vasqu vasqu 9 days ago
tomaarsen
zucchini-nlp
zucchini-nlp commented on 2026-01-15
tomaarsen FloatTensor -> LongTensor for image_tokens
de561226
tomaarsen Add simple last_hidden_state description, fix output typing of Gemma3…
d6fd9174
tomaarsen Add missing `-> tuple | BaseModel...` on check_model_inputs
7329ebc4
tomaarsen Ensure forward typing with check_model_inputs is `-> tuple | BaseMode…
72a9ac95
tomaarsen Undo accidental rename of Ovis2VisionAttention
9b670147
tomaarsen Fix incorrect type hints for blip family
cd881792
tomaarsen Merge branch 'main' into feat/normalize_get_features_methods
b58f3c53
tomaarsen Patch get_image_features for lighton_ocr
e7476694
tomaarsen Explicitly use Ovis2VisionAttention in Ovis2VisionEncoderLayer in mod…
95a55ad7
tomaarsen Update use of get_image_features for lighton_ocr
ef778324
tomaarsen Rerun python utils/add_dates.py
194a1bd7
vasqu
vasqu commented on 2026-01-14
tomaarsen
tomaarsen commented on 2026-01-15
tomaarsen Remove tie_last_hidden_states=False from check_model_inputs from ...
0ce7bacc
tomaarsen Revert accidental metaclip import change
6604784b
tomaarsen Merge branch 'main' into feat/normalize_get_features_methods
07463442
ArthurZucker
ArthurZucker commented on 2026-01-19
ArthurZucker
tomaarsen Add missing return_dict=True in get_..._features methods
ed5c1364
tomaarsen Add `output_hidden_states=True` in InternVL get_image_features
3f0c7545
tomaarsen Add missing docstring for llava_next_video get_video_features
061527de
tomaarsen Quick clean-up in _video_features_prepare_config_and_inputs test helper
af776e91
tomaarsen model.set_attn_implementation instead of config._attn_implementation
125a49d7
tomaarsen Add simple docstring to some helper methods re. inputs.
71f9f768
tomaarsen Explain why get_..._features test inputs are overridden
c69c4c54
tomaarsen Undo incorrect return_dict=True change in deepseek_vl_hybrid
72891b92
tomaarsen Revert accidental metaclip import change
0d61f664
tomaarsen Adopt **vision_outputs in instructblip, but mess remains
fa32eff4
tomaarsen
tomaarsen
tomaarsen Merge branch 'main' into feat/normalize_get_features_methods
1a381aae
ArthurZucker
ArthurZucker approved these changes on 2026-01-22
tomaarsen Avoid kwargs["output_hidden_states"] = True in get_..._features methods
a1e67675
tomaarsen Update check_model_inputs to default vision args based on config
d9001cc8
tomaarsen Unrelated but important: patch set_attn_implementation for Windows
09232167
tomaarsen Revert output_hidden_states changes on InternVL
e3b774e3
tomaarsen Extend d9001cc (check_model_inputs); remove more vision_feature_layer…
37a495c1
tomaarsen Patch unusual bug: llava_next_video used self.vision_feature_layer
bf9182da
tomaarsen Add unused use_cache to TimmWrapperModel to patch FastVLM
15c2a597
tomaarsen Merge branch 'main' into feat/normalize_get_features_methods
92fe9268
tomaarsen Update check_config_attributes to allow for vision attributes
d8604707
tomaarsen Add tests for config.return_dict=False
45d2c337
tomaarsen permute and quantize separately for the comment
5199c472
tomaarsen Ditch shared custom_args for ernie4_5_vl_moe
9865895b
tomaarsen Move Ernie4_5_VL_MoeVisionAttention next to VisionBlock
276dcaaf
tomaarsen Add missing "attentions" from Florence2 _can_record_outputs
c804de4e
tomaarsen
tomaarsen Clarify kwargs.get("image_sizes") in modeling_llava
72a1a093
tomaarsen Remove commented skip_test_image_features_output_shape in chameleon t…
43ec4b38
tomaarsen Add a migration guide under 'Library-wide changes with lesser impact'
4515b29b
tomaarsen
vasqu
vasqu approved these changes on 2026-01-22
tomaarsen Parameterize get_..._features tests with return_dict (True, False, N…
cd4c0cb0
tomaarsen Add comment re. TimmWrapper _can_record_outputs
292ef3ab
tomaarsen Shrink Gemma3nAudioEncoderModelOutput with auto_docstring & superclass
355bcb41
tomaarsen Revert "Unrelated but important: patch set_attn_implementation for Wi…
bf0ae702
tomaarsen
tomaarsen Merge branch 'main' into feat/normalize_get_features_methods
d8e786ff
github-actions
ArthurZucker ArthurZucker merged 55dadb86 into main 1 day ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone