transformers
2b4734bd
- Support passing flash_attn_kwargs when gradient_checkpointing is enabled (#37037)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
258 days ago
Support passing flash_attn_kwargs when gradient_checkpointing is enabled (#37037) * support passing flash_attn_kwargs when gradient_checkpointing is enabled * make modeling_deepspeek_v3.py consistent with modular_deepseek_v3.py
References
#37037 - Support passing flash_attn_kwargs when gradient_checkpointing is enabled
Author
efsotr
Parents
bd41b9c1
Loading