Wow, I really need it. Can it work now? I always generate black pictures with it ? Can you post the api usage, thanks a lot !
Wow, I really need it. Can it work now? I always generate black pictures with it ? Can you post the api usage, thanks a lot !
I discovered some issues today, but it should generate sensible images, rather than black ones ...
Let me complete this by this week.
Feel free to add my discord: harutatsuakiyama
Wow, I really need it. Can it work now? I always generate black pictures with it ? Can you post the api usage, thanks a lot !
I fixed the issue yesterday. The code should work as expected.
I use the following pipeline, but still generate black image.
And I replace StableDiffusionXLControlNetInpaintPipeline with StableDiffusionXLInpaintPipeline, it works well.
Is there something wrong with my code?
def inpaint_with_controlnet():
import torch
from diffusers import StableDiffusionXLInpaintPipeline
from diffusers.utils import load_image
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from pipeline_controlnet_inpaint_sd_xl import StableDiffusionXLControlNetInpaintPipeline
img_url = "https://user-images.githubusercontent.com/8084808/262496067-e01fb3c9-aece-4560-ae64-6354fdd789d7.png"
mask_url = "https://user-images.githubusercontent.com/8084808/262496139-234e0049-43ab-415b-ae6d-4cbb96055f6d.png"
control_image_url = img_url
# Compute openpose conditioning image.
from controlnet_aux import OpenposeDetector
openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
control_image = openpose(load_image(control_image_url))
controlnet = ControlNetModel.from_pretrained("thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetInpaintPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float16,
)
pipe.to("cuda")
init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")
prompt = "hand"
strength=0.5
controlnet_conditioning_scale = 1.0
image = pipe(
prompt=prompt,
image=init_image,
mask_image=mask_image,
control_image=control_image,
controlnet_conditioning_scale=controlnet_conditioning_scale,
strength=strength,
).images[0]
image.save('result.jpg')
def inpaint_with_controlnet(): import torch from diffusers import StableDiffusionXLInpaintPipeline from diffusers.utils import load_image from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, UniPCMultistepScheduler from pipeline_controlnet_inpaint_sd_xl import StableDiffusionXLControlNetInpaintPipeline img_url = "https://user-images.githubusercontent.com/8084808/262496067-e01fb3c9-aece-4560-ae64-6354fdd789d7.png" mask_url = "https://user-images.githubusercontent.com/8084808/262496139-234e0049-43ab-415b-ae6d-4cbb96055f6d.png" control_image_url = img_url # Compute openpose conditioning image. from controlnet_aux import OpenposeDetector openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet") control_image = openpose(load_image(control_image_url)) controlnet = ControlNetModel.from_pretrained("thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16) pipe = StableDiffusionXLControlNetInpaintPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16, ) pipe.to("cuda") init_image = load_image(img_url).convert("RGB") mask_image = load_image(mask_url).convert("RGB") prompt = "hand" strength=0.5 controlnet_conditioning_scale = 1.0 image = pipe( prompt=prompt, image=init_image, mask_image=mask_image, control_image=control_image, controlnet_conditioning_scale=controlnet_conditioning_scale, strength=strength, ).images[0] image.save('result.jpg')
Thank you for the code! You need to use torch.float32 instead of torch.float16. I tested the following code, should work:
def inpaint_with_controlnet():
import torch
from diffusers import StableDiffusionXLInpaintPipeline
from diffusers.utils import load_image
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from diffusers import StableDiffusionXLControlNetInpaintPipeline
img_url = "https://user-images.githubusercontent.com/8084808/262496067-e01fb3c9-aece-4560-ae64-6354fdd789d7.png"
mask_url = "https://user-images.githubusercontent.com/8084808/262496139-234e0049-43ab-415b-ae6d-4cbb96055f6d.png"
control_image_url = img_url
# Compute openpose conditioning image.
from controlnet_aux import OpenposeDetector
openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
control_image = openpose(load_image(control_image_url))
controlnet = ControlNetModel.from_pretrained("thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float32)
pipe = StableDiffusionXLControlNetInpaintPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float32,
)
pipe.to("cuda")
init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")
original_width, original_height = init_image.size
new_width = int(original_width / 2)
new_height = int(original_height / 2)
init_image = init_image.resize((new_width, new_height))
mask_image = mask_image.resize((new_width, new_height))
control_image = control_image[0].resize((new_width, new_height))
prompt = "hand"
strength=0.5
controlnet_conditioning_scale = 1.0
image = pipe(
prompt=prompt,
image=init_image,
mask_image=mask_image,
control_image=control_image,
controlnet_conditioning_scale=controlnet_conditioning_scale,
strength=strength,
).images[0]
image.save('result.jpg')
if __name__ == "__main__":
inpaint_with_controlnet()
Feel free to add my discord and we can discuss there.
Very cool PR! @yiyixuxu can you give this a look? :-)
Thanks! excellent work!
I think 2 main thing left are:
59 | |||
60 | |||
61 | # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_inpaint.prepare_mask_and_masked_image | ||
62 | def prepare_mask_and_masked_image(image, mask, height, width, return_image=False): |
We just deprecated this function :)
in this PR #4444 (comment)
let's update this PR too
Updated
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)
self.mask_processor = VaeImageProcessor(
vae_scale_factor=self.vae_scale_factor, do_normalize=False, do_binarize=True, do_convert_grayscale=True)
self.control_image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor, do_convert_rgb=True, do_normalize=False)
254 | self.control_image_processor = VaeImageProcessor( | ||
255 | vae_scale_factor=self.vae_scale_factor, do_convert_rgb=True, do_normalize=False | ||
256 | ) | ||
257 | self.watermark = StableDiffusionXLWatermarker() |
add a mask_processor
here
Done
149 | generator = torch.Generator(device=device).manual_seed(seed) | ||
150 | |||
151 | controlnet_embedder_scale_factor = 2 | ||
152 | control_image = randn_tensor( |
I think we accept image tensor in [0,1]
range, so should not use randn_tensor
here
Thank you! Corrected.
control_image = (
floats_tensor(
(1, 3, 32 * controlnet_embedder_scale_factor, 32 * controlnet_embedder_scale_factor),
rng=random.Random(seed),
)
.to(device)
.cpu()
)
158 | init_image = init_image.cpu().permute(0, 2, 3, 1)[0] | ||
159 | |||
160 | controlnet_embedder_scale_factor = 2 | ||
161 | image = Image.fromarray(np.uint8(init_image)).convert("RGB").resize((64, 64)) |
the dummy image
and mask_image
are just 2 black images here
let's do something similar as https://github.com/huggingface/diffusers/pull/4536/files#diff-b65a24df736726ca6f92c71567b77c2a9832ee6142ee2dcbdb08e9addcb6da4b
Followed the link's code,
image = floats_tensor((1, 3, 32, 32), rng=random.Random(seed)).to(device)
image = image.cpu().permute(0, 2, 3, 1)[0]
mask_image = torch.ones_like(image)
controlnet_embedder_scale_factor = 2
control_image = (
floats_tensor(
(1, 3, 32 * controlnet_embedder_scale_factor, 32 * controlnet_embedder_scale_factor),
rng=random.Random(seed),
)
.to(device)
.cpu()
)
270 | assert np.abs(image_slice_1.flatten() - image_slice_3.flatten()).max() > 1e-4 | ||
271 | |||
272 | # Ignore float16 for SDXL | ||
273 | def test_float16_inference(self): |
why do we disable this?
This was unintentional. Removed the disabling.
Thank you @yiyixuxu and @patrickvonplaten. I will work on comments this week.
Borrowing ideas of PR 4811. Working in progress.
Hey @viiika,
Could we maybe work on this PR together? @harutatsuakiyama can you maybe invite @viiika as a collaborator for this PR to your fork so that we can work here?
@viiika , it's quite rare that we have two PRs about the same feature popping up almost at the same time - very sorry for the potentially duplicated work. Would it be ok to pass onto this PR because:
That would be very nice if we could collaborate here 🙏
113 | return mask | ||
114 | |||
115 | |||
116 | def prepare_mask_and_masked_image(image, mask, height, width, return_image: bool = False): |
Can we remove this function and instead use the new mask processor logic: #4444
@harutatsuakiyama I think you can delete this function now if not used?
I still insist that #4811 already support some new features mentioned in #4694, like MultiControlnet, the api usage, no randn_tensor for control_image, even refactor with a mask_image_processor you mentioned just now, etc.
And the coding style is more consistent with pipeline_stable_diffusion_xl_inpaint, compared to StableDiffusionControlNetInpaintPipeline adapted from StableDiffusionInpaintPipeline.
I believe #4811 requires almost no effort to review, because it and the latest pipeline_stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint are updated synchronously.
Despite this, merge which PR depends you. And I believe if you choose #4811, it may take less than a day for us to merge.
Also, if you still insist we should continue with #4694, that's fine with me and I can try my best to help fixing problems. I just think merging #4694 will take a few weeks to handle many problems, and might introduce some design inconsistencies. A lot of current research relies on this pipeline, so I just hope it gets merged soon.
Hi @yiyixuxu and @patrickvonplaten, thank you for the review. I have addressed the code review and updated the code. Now, the code supports MultiControlNet, and uses processor.
The test file has also been implemented. All test has shown to be passed locally.
Thank @viiika for uploading the code. I have borrowed ideas of @viiika's code. Hence, including him/her as an author, as indicated in the title of the code.
Let me know if more modifications are needed.
1 | # Copyright 2023 Harutatsu Akiyama, Jinbin Bai, and The HuggingFace Team. All rights reserved. |
Do we ever include the contributor names in here?
@patrickvonplaten @sayakpaul
In my previous contributions, I have put names :-) : #4079
We usually don't, but it shouldn't be a big deal to leave it if you feel strongly @harutatsuakiyama - it's OSS in the end of the day
Thank you! This will be encouraging :-)
54 | logger = logging.get_logger(__name__) # pylint: disable=invalid-name | ||
55 | |||
56 | |||
57 | EXAMPLE_DOC_STRING = """ |
this example needs to be updated no?
Updated
98 | return noise_cfg | ||
99 | |||
100 | |||
101 | def mask_pil_to_torch(mask, height, width): |
we can remove this function
Removed.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
Hi @yiyixuxu. Thanks for the review. I have addressed the review comments:
My local tests show no issues. Please let me know if further changes are required :-)
996 | ] = None, | ||
997 | height: Optional[int] = None, | ||
998 | width: Optional[int] = None, | ||
999 | strength: float = 1.0, |
strength: float = 1.0, | |
strength: float =0.9999, |
Changed, but why?
1049 | The height in pixels of the generated image. | ||
1050 | width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor): | ||
1051 | The width in pixels of the generated image. | ||
1052 | strength (`float`, *optional*, defaults to 1.): |
strength (`float`, *optional*, defaults to 1.): | |
strength (`float`, *optional*, defaults to 0.9999): |
Changed, can I curiously ask why?
1306 | |||
1307 | control_image = control_images | ||
1308 | else: | ||
1309 | assert False |
assert False | |
raise ValueError(f"{controlnet.__class__} is not supported.") |
Changed
Good to merge once @yiyixuxu is ok with it :-)
@viiika could you maybe drop your email here so that we can add you as a co-author via https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors
@viiika could you maybe drop your email here so that we can add you as a co-author via https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors
Sure. My primary GitHub email for this account is 1355864570@qq.com. Thank you very much!
@harutatsuakiyama
let's make sure the code quality checks pass. make style
please :)
@viiika could you maybe drop your email here so that we can add you as a co-author via https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors
Sure. My primary GitHub email for this account is 1355864570@qq.com. Thank you very much!
@harutatsuakiyama could you add @viiika as an author here that would be very nice ❤️
Hi @yiyixuxu, @patrickvonplaten, and @viiika,
I have addressed the new code review comments:
For the failing tests, it seems previous failure was due to Internet issues (500 bad gate). My local tests can pass.
Please let me know if further changes are required.
@harutatsuakiyama
Could you run make fix-copies
and make style
-
Let's make sure CI is green
Thank you @yiyixuxu. I just realized that diffusers.utils.dummy_torch_and_transformers_objects.py has some style problems. I have fixed them.
The following shows outputs of make fix-copies
and make style
. The errors of make style
are not due to the code that I have uploaded. I think this time, the CI should be green :-)
Let me know if other things are required.
make fix-copies
python utils/check_copies.py --fix_and_overwrite
python utils/check_dummies.py --fix_and_overwrite
make style
black examples scripts src tests utils
All done! ✨ 🍰 ✨
613 files left unchanged.
ruff examples scripts src tests utils --fix
examples/community/lpw_stable_diffusion_xl.py:1141:42: E721 Do not compare types, use `isinstance()`
examples/community/stable_diffusion_xl_reference.py:703:42: E721 Do not compare types, use `isinstance()`
src/diffusers/experimental/rl/value_guided_sampling.py:79:12: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py:181:12: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py:827:42: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py:909:20: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py:1132:20: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/t2i_adapter/pipeline_stable_diffusion_xl_adapter.py:877:42: E721 Do not compare types, use `isinstance()`
tests/pipelines/consistency_models/test_consistency_models.py:190:12: E721 Do not compare types, use `isinstance()`
tests/pipelines/unidiffuser/test_unidiffuser.py:112:12: E721 Do not compare types, use `isinstance()`
tests/pipelines/unidiffuser/test_unidiffuser.py:548:12: E721 Do not compare types, use `isinstance()`
tests/pipelines/unidiffuser/test_unidiffuser.py:651:12: E721 Do not compare types, use `isinstance()`
Found 12 errors.
make: *** [Makefile:59: style] Error 1
Ahh I see, I need to run the test for doc builder. Let me do that. I aim that to be the last test.
Sorry for failing test again. Can I ask for hints about how to fix this error? @yiyixuxu Also, can we get access to run tests, for more efficient debugging purposes? I have tried locally, and seem to be correct ...
All done! ✨ 🍰 ✨
617 files would be left unchanged.
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.7.17/x64/bin/doc-builder", line 8, in <module>
sys.exit(main())
File "/opt/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/doc_builder/commands/doc_builder_cli.py", line 47, in main
args.func(args)
File "/opt/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/doc_builder/commands/style.py", line 28, in style_command
raise ValueError(f"{len(changed)} files should be restyled!")
ValueError: 1 files should be restyled!
Error: Process completed with exit code 1.
113 | >>> mask_image = load_image(mask_url).convert("RGB") | ||
114 | |||
115 | >>> original_width, original_height = init_image.size | ||
116 | >>> new_width = int(original_width / 2) |
why do we resize?
This is to save CUDA memory. Removed in the new code.
977 | self, | ||
978 | prompt: Union[str, List[str]] = None, | ||
979 | prompt_2: Optional[Union[str, List[str]]] = None, | ||
980 | image: Union[ |
let's use a custom type PipelineImageInput
(was recently introduced)
985 | List[PIL.Image.Image], | ||
986 | List[np.ndarray], | ||
987 | ] = None, | ||
988 | mask_image: Union[torch.FloatTensor, PIL.Image.Image] = None, |
I think mask_image
should be of same type as image
no? PipelineImageInput
1495 | latent_model_input = torch.cat([latent_model_input, mask, masked_image_latents], dim=1) | ||
1496 | |||
1497 | # predict the noise residual | ||
1498 | added_cond_kwargs = {"text_embeds": add_text_embeds, "time_ids": add_time_ids} |
I don't think this line is needed? it has not changed from line 1452
76 | projection_class_embeddings_input_dim=80, # 6 * 8 + 32 | ||
77 | cross_attention_dim=64, | ||
78 | ) | ||
79 | torch.manual_seed(0) |
Why do we need to fix the seed here? I don't think we have any randomness here, no?
I followed the test here: https://github.com/huggingface/diffusers/blob/main/tests/pipelines/controlnet/test_controlnet_sdxl.py
58 | image_latents_params = TEXT_TO_IMAGE_IMAGE_PARAMS | ||
59 | |||
60 | def get_dummy_components(self): | ||
61 | torch.manual_seed(0) |
is this needed?
92 | projection_class_embeddings_input_dim=80, # 6 * 8 + 32 | ||
93 | cross_attention_dim=64, | ||
94 | ) | ||
95 | torch.manual_seed(0) |
same, needed?
Similarly, follow test here: https://github.com/huggingface/diffusers/blob/main/tests/pipelines/controlnet/test_controlnet_sdxl.py
regards to the quality test, make sure you are up to date? pip install --upgrade -e .["quality"]
cc @DN6 here we need help with tests!
I found out the test issues, some lines in doc_string is too long.
Hi @yiyixuxu. I removed EXAMPLE_DOC_STRING
since it keeps getting errors for doc-builder style src/diffusers docs/source --max_len 119 --check_only --path_to_docs docs/source
. In the future, I will try getting it back, maybe need some help from the test experts :-)
For now, I strongly believe the code should be able to pass tests (finger crossed 🙏)
Hi @yiyixuxu, thanks for the new review round. I have addressed the comments:
PipelineImageInput
.guess_mode
.EXAMPLE_DOC_STRING
.guess_mode
.Also, I strongly believe the code should be able to pass tests (finger crossed 🙏)
Let me know if further changes are required.
Login to write a write a comment.
Overview:
This PR introduces the implementation of the inference pipeline for ControlNet with SDXL and inpainting.
Files Modified/Added:
srcs/pipelines/controlnet/pipeline_control_inpaint_sd_xl.py
tests/pipelines/controlnet/test_controlnet_inpaint_sdx.py
Visualizations:
To better understand the impact and functionality of the implemented pipeline, the following visualizations are provided:
Overview:
This PR introduces the implementation of the inference pipeline for ControlNet with SDXL and inpainting.
Files Modified/Added:
srcs/pipelines/controlnet/pipeline_control_inpaint_sd_xl.py
tests/pipelines/controlnet/test_controlnet_inpaint_sdx.py
Example Usage
Features