[Cuda] Demo multiple cuda graphs and user compute stream (#19883)
Update stable diffusion demo to add options `--max-cuda-graphs` and
`--user-compute-stream`.
* Add python class GpuBindingManager to manage IO Binding based on input
shape and max number of cuda graphs setting. The benefit is that one
inference session could enable or disable cuda graph in different runs.
* When `--user-compute-stream`, the demo will use custom compute stream.