transformers
Generalize decay_mask_fn to apply mask to all LayerNorm params
#18273
Merged

Loading