minimal stable diffusion GPU memory usage with accelerate hooks (#850)
* add method to enable cuda with minimal gpu usage to stable diffusion
* add test to minimal cuda memory usage
* ensure all models but unet are onn torch.float32
* move to cpu_offload along with minor internal changes to make it work
* make it test against accelerate master branch
* coming back, its official: I don't know how to make it test againt the master branch from accelerate
* make it install accelerate from master on tests
* go back to accelerate>=0.11
* undo prettier formatting on yml files
* undo prettier formatting on yml files againn