DeepSpeed
Add local attention for GPT-Neo model architecture
#1114
Merged

Loading