onnxruntime
742658d1 - Stable Diffusion CUDA optimizations Part 2 (#14597)

Commit

2 years ago

Stable Diffusion CUDA optimizations Part 2 (#14597) ### Description This is a follow-up of https://github.com/microsoft/onnxruntime/pull/14428 for Stable Diffusion CUDA optimizations: (1) use NchwConv to replace Conv in onnx graph and add Tranpose nodes accordingly (2) reduce sequential Transpose nodes to at most one. (3) symbolic shape infer of NchwConv (4) fix add bias transpose which causes CUDA error (launching more than 1024 threads per block) in inferencing fp32 model. (5) add models (bert, bart, stable_diffusion subdirectories) to package; (6) remove option --disable_channels_last Note that (1) We can add a few graph transformations to reduce Transpose nodes further. It is not done in this PR due to time limit. (2) Stable diffusion 2.1 model outputs black images. It seems that forcing Attention to float32 could avoid the issue. However it is much slow to use float32 Attention. ### Motivation and Context

References

#14597 - Stable Diffusion CUDA optimizations Part 2

Author

tianleiwu

Parents

f88a4646

onnxruntime 742658d1 - Stable Diffusion CUDA optimizations Part 2 (#14597)

onnxruntime
742658d1 - Stable Diffusion CUDA optimizations Part 2 (#14597)