Pull Requests huggingface/transformers

Update tokenizer mappings to use TokenizersBackend for additional models

#46091 opened 2026-05-20 02:57 by itazap

skip deepgemm test except cuda

#46090 opened 2026-05-20 02:30 by jiqing-feng

Fix use_cache with seq_len > 1 ( #46032)

#46084 opened 2026-05-19 20:01 by Ramshankar07

[CTRL] Support attn_implementation="sdpa" dispatch

#46073 opened 2026-05-19 09:25 by YangKai0616

Add TDT loss kernel

#46048 opened 2026-05-19 02:32 by ebezzam

Add the other processors to auto-mappings

#46046 opened 2026-05-19 00:50 by zucchini-nlp

[`Kernels`] Sync to latest version and add new kernels (SwiGLU, CE)

#46039 opened 2026-05-18 17:49 by vasqu

Widen tols for float16/bfloat16

#46036 opened 2026-05-18 15:06 by Rocketknight1

Support Granite Speech NAR (NLE) New model Audio

#46031 opened 2026-05-18 12:52 by avihu111

fix: add pickle support to _LazyConfigMapping for spawn multiprocessing

#46026 opened 2026-05-18 08:12 by kfojcik-intel

fix(llama4): align MoE interface for EP/TP compatibility

#46024 opened 2026-05-18 08:00 by Chao1Han

[docs] tp for continuous batching

#46019 opened 2026-05-18 02:12 by stevhliu

docs: document flash attention supports_mapping

#46013 opened 2026-05-17 15:26 by nightcityblade

[new model] Add zaya1 vl

#46011 opened 2026-05-17 10:33 by JJJYmmm

docs: fix OOM/hang for Qwen dynamic resolution models by setting min_pixels and max_pixels

#46007 opened 2026-05-16 15:35 by poojansatani

[Image Processor] Speed up image processors by casting to array before BatchFeature

#46004 opened 2026-05-16 13:17 by Apeksha23-hub

fix(cli): make requests import optional in chat.py

#45999 opened 2026-05-16 06:31 by blut-agent

[Weight converter] Account for base model prefix in scoped weight conversion

#45996 opened 2026-05-15 14:11 by yonigozlan

Fix decoder_attention_mask None handling in generation utils

#45985 opened 2026-05-15 03:01 by damodharg6

no empty label when Grounding Dino detects nothing

#45982 opened 2026-05-14 17:51 by catwell

[Generation] Add static ensemble verification for lossy speculative decoding

#45979 opened 2026-05-14 14:42 by kasakh

GgufLinear: inference-time GGUF matmul on Apple Silicon — llama.cpp parity

#45977 opened 2026-05-14 10:29 by ArthurZucker

GGUF: optional Metal dequant fast path via kernels-community

#45975 opened 2026-05-14 08:22 by ArthurZucker

Fix OLMo 3 scaled RoPE handling for sliding attention

#45945 opened 2026-05-13 14:56 by nurpax

Fix models for which we don't have a dedicated tokenizer class, and the listed one is incorrect

#45936 opened 2026-05-13 07:53 by itazap

DO NOT MERGE testing grafana

#45932 opened 2026-05-13 06:04 by tarekziade

Stop align_special_tokens from rewriting eos_token_id when no alignment is needed

#45917 opened 2026-05-12 13:51 by 1fanwang

Fix Gemma4 inputs_embeds OOM during per-layer lookup

#45883 opened 2026-05-11 05:43 by chirag-gupta-07

feat(t5gemma2): add Flash Attention 2 support

#45868 opened 2026-05-10 17:28 by AjAyrAo43

[new model] Add Zyphra/ZAYA1-8B New model

#45862 opened 2026-05-09 12:22 by JJJYmmm