onnxruntime
83547d30 - [CUDA] Fix SkipLayerNorm vectorized kernel out-of-bounds read (#17943)

Commit

2 years ago

[CUDA] Fix SkipLayerNorm vectorized kernel out-of-bounds read (#17943) Fix a bug in https://github.com/microsoft/onnxruntime/pull/11803: When hidden size is not exactly same as next size (for example ld=320 in stable diffusion) current vectorized kernel might read out-of-bounds, and might cause CUDA failure. Also resolved another issue: for the first and last size, current macro will cause some dead code (some branch will never run). Here we change it to avoid those branches in boundary sizes. Performance tests with stable diffusion shows that the performance is on-par before/after this fix.

References

#17943 - [CUDA] Fix SkipLayerNorm vectorized kernel out-of-bounds read

Author

tianleiwu

Parents

cf974f09

onnxruntime 83547d30 - [CUDA] Fix SkipLayerNorm vectorized kernel out-of-bounds read (#17943)

onnxruntime
83547d30 - [CUDA] Fix SkipLayerNorm vectorized kernel out-of-bounds read (#17943)