transformers
9ded3dbb - [deepseek_v4] keep hc_head / sinks / position_bias in fp32 (#46198)

Commit
1 day ago
[deepseek_v4] keep hc_head / sinks / position_bias in fp32 (#46198) Issue #46167: 417 fp32 plumbing tensors get downcast to bf16 because `_keep_in_fp32_modules_strict` was missing entries for `hc_head` (top-level + MTP), `sinks` (per-attention sink token), and `position_bias` (compressor and indexer compressor). Adds the three patterns so save_pretrained preserves the source dtype for the full set of 417 tensors instead of 305.
Author
Parents
Loading