onnxruntime
a6d6e45b - Tune block size for layer_norm considering #rows and GPU resource (#15410)

Commit
2 years ago
Tune block size for layer_norm considering #rows and GPU resource (#15410) fine tune cuda layernorm block size considering number of rows to process together with column number, and hardware resources (number of SMs, etc) Co-authored-by: Lei Zhang <phill.zhang@gmail.com>
Author
Parents
Loading