transformers
2b4734bd - Support passing flash_attn_kwargs when gradient_checkpointing is enabled (#37037)

Commit
258 days ago
Support passing flash_attn_kwargs when gradient_checkpointing is enabled (#37037) * support passing flash_attn_kwargs when gradient_checkpointing is enabled * make modeling_deepspeek_v3.py consistent with modular_deepseek_v3.py
Author
Parents
Loading