onnxruntime
01b5c789 - Add SD-Turbo and refine diffusion demo (#18694)

Commit
2 years ago
Add SD-Turbo and refine diffusion demo (#18694) [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) is a fast generative text-to-image model that distilled from [Stable Diffusion 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1). It is targeted for 512x512 resolution. 1. Support sd-turbo model. 1. Refiner ControlNet in demo + Cache the ControlNet model so that it is downloaded only once. + Do not download default images in script. Instead update document to use wget to download example image. + Fix an issue of control image processing that causes shape mismatch in inference. 1. Refine arguments: + Change argument --disable-refiner to --enable-refiner since refiner is not used in most cases + Rename --refiner-steps to --refiner_denoising_steps + Add abbreviations for most used arguments. + Add logic to set default arguments for different models. 1. Refine torch model cache: + Share cached torch model among different engines to save disk space. + Only download fp16 model (previously, ORT_CUDA downloads fp32 model). 1. Do not use vae slicing when image size is small. 1. For LCM scheduler, allow guidance scale 1.0~2.0. 2. Allow sdxl-turbo to use refiner ### Performance Test Results Average latency in ms for SD-Turbo (FP16, EulerA, 512x512) on A100-SXM4-80GB. Batch | Steps | TRT 8.6 static | ORT_TRT static | ORT_CUDA static | TRT 8.6 dynamic | ORT_TRT dynamic | ORT_CUDA dynamic -- | -- | -- | -- | -- | -- | -- | -- 1 | 1 | 32.07 | 30.55 | 32.89 | 36.41 | 38.30 | 34.83 4 | 1 | 125.36 | 97.40 | 97.49 | 118.24 | 114.95 | 99.10 1 | 4 | 62.29 | 60.24 | 62.50 | 72.49 | 77.82 | 67.66 4 | 4 | 203.51 | 173.11 | 168.32 | 217.14 | 215.71 | 172.53 * Dynamic engine is built for batch size 1 to 8, image size 512x512 to 768x768, optimized for batch size 1 and 512x512
Author
Parents
Loading