transformers
2b4734bd - Support passing flash_attn_kwargs when gradient_checkpointing is enabled (#37037)

Commit

258 days ago

Support passing flash_attn_kwargs when gradient_checkpointing is enabled (#37037) * support passing flash_attn_kwargs when gradient_checkpointing is enabled * make modeling_deepspeek_v3.py consistent with modular_deepseek_v3.py

References

#37037 - Support passing flash_attn_kwargs when gradient_checkpointing is enabled

Author

efsotr

Parents

bd41b9c1

transformers 2b4734bd - Support passing flash_attn_kwargs when gradient_checkpointing is enabled (#37037)

transformers
2b4734bd - Support passing flash_attn_kwargs when gradient_checkpointing is enabled (#37037)