diffusers
StableDiffusionLatentUpscalePipeline - positive/negative prompt embeds support
#8947

Merged

StableDiffusionLatentUpscalePipeline - positive/negative prompt embeds support #8947

yiyixuxu merged 20 commits into huggingface:main from rootonchair:latent_upscaler_prompt_embeds

rootonchair288 days ago❤ 2

What does this PR do?

Fixes #8895

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu

make latent upscaler accept prompt embeds

59d220d5

rootonchair288 days ago

Test script

from diffusers import StableDiffusionLatentUpscalePipeline, StableDiffusionPipeline
import torch

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipeline.to("cuda")

upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained("stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16)
upscaler.to("cuda")

prompt = "a photo of an astronaut high resolution, unreal engine, ultra realistic"
generator = torch.manual_seed(33)

# we stay in latent space! Let's make sure that Stable Diffusion returns the image
# in latent space
low_res_latents = pipeline(prompt, generator=generator, output_type="latent").images

upscaled_image = upscaler(
    prompt=prompt,
    image=low_res_latents,
    num_inference_steps=20,
    guidance_scale=0,
    generator=generator,
).images[0]

# Let's save the upscaled image under "upscaled_astronaut.png"
upscaled_image.save("astronaut_1024.png")

prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds = upscaler.encode_prompt(
    prompt=prompt,
    device=upscaler._execution_device,
    do_classifier_free_guidance=False,
)

upscaled_image = upscaler(
    image=low_res_latents,
    num_inference_steps=20,
    guidance_scale=0,
    generator=generator,
    prompt_embeds=prompt_embeds,
    pooled_prompt_embeds=pooled_prompt_embeds,
).images[0]

upscaled_image.save("embeds_astronaut_1024.png")

# as a comparison: Let's also save the low-res image
with torch.no_grad():
    image = pipeline.decode_latents(low_res_latents)
image = pipeline.numpy_to_pil(image)[0]

image.save("astronaut_512.png")

HuggingFaceDocBuilderDev287 days ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

fix style

c1bbb169

Merge branch 'main' into latent_upscaler_prompt_embeds

7e1fd907

yiyixuxu approved these changes on 2024-07-25

yiyixuxu286 days ago

oh thanks! I think it looks really nice!

yiyixuxu requested a review from

sayakpaul 286 days ago

sayakpaul commented on 2024-07-25

sayakpaul286 days ago

LGTM! Could you also share some results stemming from this change?

Let's also add a fast test for this?

Merge branch 'main' into latent_upscaler_prompt_embeds

dc699d4b

rootonchair286 days ago👍 1

Could you also share some results stemming from this change?

Sure
Here is the original result

And result using embeds

which does not have much different

Let's also add a fast test for this?

Should I open another PR for this @sayakpaul ?

Merge branch 'main' into latent_upscaler_prompt_embeds

cae358df

sayakpaul286 days ago

Should I open another PR for this @sayakpaul ?

We can do this in this PR

add base tests

a8593c59

Merge branch 'latent_upscaler_prompt_embeds' of github.com:rootonchai…

a87c08f2

Merge branch 'main' into latent_upscaler_prompt_embeds

a33a6bcc

remove false typing

543796a5

Merge branch 'latent_upscaler_prompt_embeds' of github.com:rootonchai…

de5b3f50

run make style

e181b3e8

complete fix fast test

1ac963a2

rootonchair269 days ago

Hi @sayakpaul, could you help me review this PR?

Merge branch 'main' into latent_upscaler_prompt_embeds

fd20bb46

fix copies and style

ceeb2f29

Merge branch 'latent_upscaler_prompt_embeds' of github.com:rootonchai…

52c3c788

rootonchair requested a review from

sayakpaul 268 days ago

yiyixuxu commented on 2024-08-18

tests/pipelines/stable_diffusion/test_stable_diffusion_latent_upscaler.py

# coding=utf-8

yiyixuxu262 days ago

oh actually I think the tests for latent upscaler is here https://github.com/huggingface/diffusers/blob/main/tests/pipelines/stable_diffusion_2/test_stable_diffusion_latent_upscale.py

maybe we add new tests there? very sorry for all the additional work to create a new test from scratch

modify existing tests

e38c554f

rootonchair261 days ago👍 1

@yiyixuxu I have just update the existing test. Could you help me review? Btw, adding new test is kinda interesting too. So no worries

yiyixuxu commented on 2024-08-19

tests/pipelines/stable_diffusion_2/test_stable_diffusion_latent_upscale.py

173	173
174	174	self.assertEqual(image.shape, (1, 256, 256, 3))
175	175	expected_slice = np.array(
176		[0.47222412, 0.41921633, 0.44717434, 0.46874192, 0.42588258, 0.46150726, 0.4677534, 0.45583832, 0.48579055]
	176	[0.3970313, 0.3768756, 0.41147298, 0.4716793, 0.5115408, 0.44601366, 0.43763855, 0.46781355, 0.46358708]

yiyixuxu261 days ago

why do we need to update the expected_slice here? the results of existing tests should not change, no?

rootonchair commented on 2024-08-20

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

383	543	if image.shape[1] == 3:
384	544	# encode image if not in latent-space yet
385		image = self.vae.encode(image).latent_dist.sample() * self.vae.config.scaling_factor
	545	image = retrieve_latents(self.vae.encode(image), generator=generator) * self.vae.config.scaling_factor

rootonchair260 days ago👍 1

@yiyixuxu I think it's due to this line. The old code does not take in generator

yiyixuxu commented on 2024-08-20

yiyixuxu260 days ago❤ 1

I think we got the order of the negative_prompt_embeds and prompt_embeds reversed, that's why the previous test wasn't passing. Let's make the change, and change the test back and make sure it passes :)

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffu…

d47333eb

Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffu…

d98a007f

Update tests/pipelines/stable_diffusion_2/test_stable_diffusion_laten…

4bc27ce8

update expected slice

964f6cd8

rootonchair259 days ago

@yiyixuxu thanks for the catch! Sorry I did not check it thoroughly

yiyixuxu approved these changes on 2024-08-21

yiyixuxu259 days ago

thanks!

yiyixuxu merged 867e0c91 into main 259 days ago

Reviewers

yiyixuxu

sayakpaul

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

378	534	)
379	535
	536	if do_classifier_free_guidance:
	537	prompt_embeds = torch.cat([prompt_embeds, negative_prompt_embeds])
	538	pooled_prompt_embeds = torch.cat([pooled_prompt_embeds, negative_pooled_prompt_embeds])

-            prompt_embeds = torch.cat([prompt_embeds, negative_prompt_embeds])
-            pooled_prompt_embeds = torch.cat([pooled_prompt_embeds, negative_pooled_prompt_embeds])
+            prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds])
+            pooled_prompt_embeds = torch.cat([negative_pooled_prompt_embeds, pooled_prompt_embeds])

	151		**kwargs,
	152		)
	153
	154		prompt_embeds = torch.cat([prompt_embeds, negative_prompt_embeds])
	155		pooled_prompt_embeds = torch.cat([pooled_prompt_embeds, negative_pooled_prompt_embeds])

-        prompt_embeds = torch.cat([prompt_embeds, negative_prompt_embeds])
-        pooled_prompt_embeds = torch.cat([pooled_prompt_embeds, negative_pooled_prompt_embeds])
+        prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds])
+        pooled_prompt_embeds = torch.cat([negative_pooled_prompt_embeds, pooled_prompt_embeds])

diffusers StableDiffusionLatentUpscalePipeline - positive/negative prompt embeds support #8947 Merged

StableDiffusionLatentUpscalePipeline - positive/negative prompt embeds support #8947

What does this PR do?

Before submitting

Who can review?

diffusers
StableDiffusionLatentUpscalePipeline - positive/negative prompt embeds support
#8947

Merged