DeepSpeed
only override forward if using cuda-graph
#2291
Merged

Loading