onnxruntime
95f96287 - Fix transformers optimizations for GPT-NeoX (#16743)

Commit
2 years ago
Fix transformers optimizations for GPT-NeoX (#16743) ### Description Fix some issues found in GPT-NeoX graph fusion: (1) GPT-NeoX uses float16 weights. The step of using onnxruntime with opt_level==1 uses CPU provider. Since most operators does not have fp16 in CPU EP, so extra Cast nodes are added to up cast to fp32. (2) Add is shared by two LayerNormalization children, and SkipLayerNormalization might cause invalid graph. (3) Reshape fusion might miss since some part only check initializer but not Constant. This PR adds a check whether model uses FP16, and output a warning when use_gpu is not True, and use GPU provider for graph optimization when use_gpu=True.
Author
Parents
Loading