transformers
[Bugfix] `OPTDecoderLayer` does not return attentions when `gradient_checkpointing` and `training` is enabled.
#23367
Merged

Loading