cryptexis1 year ago (edited 1 year ago)❤ 6🚀 2

What does this PR do?

This functionality allows training/fine-tuning of the 9 channel inpainting models provided by

This is due to noticing that many inpainting models provided to the community e.g. on https://civitai.com/ have unets with 4 input channels. 4 channel models may lack capacity and eventually quality in the inpainting tasks. To support the community to develop fully fledged inpainting models I have modified the text_to_image training pipeline to do inpainting.

Additions:

Added random masking strategy (squares) during the training, center crop during validation
Take first 3 images of the pokemon dataset as validation set

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sayakpaul and @patrickvonplaten

Examples Out of Training Distribution Scenery:

Prompt: a drawing of a green pokemon with red eyes

Pre-trained

Fine-tuned

Prompt: a green and yellow toy with a red nose

Pre-trained

Fine-tuned

Prompt: a red and white ball with an angry look on its face

Pre-trained

Fine-tuned

wip: training script

2116de29

wip: update documentation

882cb67b

fix: README

89854ee9

fix: README title

969605f1

sayakpaul requested a review from

patil-suraj 1 year ago

cryptexis1 year ago

hi @patil-suraj @sayakpaul, was wondering if this is something interesting for you to look into ? Feedback is appreciated

yiyixuxu1 year ago

cool!
gentle pin @patil-suraj

drhead1 year ago👍 4

I've experimented with finetuning proper inpainting models before. I strongly urge you to read the LAMA paper (https://arxiv.org/pdf/2109.07161.pdf) and implement their masking strategy (which is what is used by the stable-diffusion-inpainting checkpoint). I used a very simple masking strategy like what you had for a long time and never got satisfactory results with my model until switching to the LAMA masking strategy. Training on simple white square masks will severely degrade the performance of the pretrained SD inpainting model.

sayakpaul commented on 2024-02-19

Conversation is marked as resolved

Show resolved

sayakpaul commented on 2024-02-19

Conversation is marked as resolved

Show resolved

sayakpaul commented on 2024-02-19

examples/inpainting/train_inpainting.py

	603
	604		if args.push_to_hub:
	605		repo_id = create_repo(
	606		repo_id=args.hub_model_id or Path(args.output_dir).name, exist_ok=True, token=args.hub_token

sayakpaul1 year ago

Let's make sure to follow:

diffusers/examples/text_to_image/train_text_to_image.py

Line 499 in 31de879

if args.report_to == "wandb" and args.hub_token is not None:

Otherwise, hub_token will be compromised on wandb run page.

cryptexis1 year ago

done

sayakpaul1 year ago

Seems like this comment wasn't addressed?

sayakpaul commented on 2024-02-19

Conversation is marked as resolved

Show resolved

HuggingFaceDocBuilderDev1 year ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul commented on 2024-02-19

Conversation is marked as resolved

Show resolved

sayakpaul commented on 2024-02-19

sayakpaul1 year ago

Left some initial comments. Looking quite nice.

I do think having an option to enable LAMA like making might be a very good reference point as our training scripts are quite widely referenced.

And I apologize for the delay.

Merge branch 'main' into sd_15_inpainting

18191cc5

cryptexis1 year ago👍 2

I've experimented with finetuning proper inpainting models before. I strongly urge you to read the LAMA paper (https://arxiv.org/pdf/2109.07161.pdf) and implement their masking strategy (which is what is used by the stable-diffusion-inpainting checkpoint). I used a very simple masking strategy like what you had for a long time and never got satisfactory results with my model until switching to the LAMA masking strategy. Training on simple white square masks will severely degrade the performance of the pretrained SD inpainting model.

@sayakpaul

I thought having the most simple implementation would do. And then the user can decide which masking strategy to use actually. Sure will add that, if that's a deal breaker

cryptexis1 year ago

@sayakpaul I have adapted masking strategy from LAMA paper on my local branch. I have a question, is it according to guidelines to have a config file properties for the masking separately, like here:
https://github.com/advimman/lama/blob/main/configs/training/data/abl-04-256-mh-dist-celeba.yaml#L10 ?

I feel it is a bit extensive and confusing to make all of those property values as part of CLI arguments, might clutter and confuse - which arguments are model specific and which ones are data specific.

wip: integrating LAMA masking

69d4494f

wip: merged commits

272dc875

sayakpaul1 year ago

I feel it is a bit extensive and confusing to make all of those property values as part of CLI arguments, might clutter and confuse - which arguments are model specific and which ones are data specific.

You are absolutely correct. What we can do is include a note about the masking strategy in the README and link to your implementation. Does that sound good?

yiyixuxu added training

wip: final fixes

07c8fd1a

wip: updating README

c1c3a0e3

Merge branch 'main' into sd_15_inpainting

cd619ffe

sayakpaul approved these changes on 2024-03-02

sayakpaul1 year ago

Looking really nice now. I will let @patil-suraj review this too.

examples/inpainting/train_inpainting.py

	350		prompt = batch["prompts"][0]
	351
	352		with torch.autocast("cuda"):
	353		#### UPDATE PIPELINE HERE

sayakpaul1 year ago

Does this command need to be removed?

cryptexis1 year ago

which one ?

sayakpaul1 year ago

"#### UPDATE PIPELINE HERE"

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

examples/inpainting/train_inpainting.py

	1336		init_image = image_transform(batch["pixel_values"][0])
	1337		prompt = batch["prompts"][0]
	1338
	1339		with torch.autocast("cuda"):

sayakpaul1 year ago

Let's make use of the log_validation() function here and log the results to wandb as well. You can refer to https://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py for implementing this. But let me know if you need some more clarifications.

cryptexis1 year ago

done

sayakpaul1 year ago👍 1

I think we also need to add a test case here.

cryptexis1 year ago

@sayakpaul I think it's a github glitch. :) to the extent that I cannot reply you there.

https://github.com/cryptexis/diffusers/blob/sd_15_inpainting/examples/inpainting/train_inpainting.py#L771 - in my repo I do not have anything similar to it under those lines. And the piece of code you're referring to is here.

cryptexis1 year ago (edited 1 year ago)

I think we also need to add a test case here.

I see a lot of https://huggingface.co/hf-internal-testing is used in the testing. Are usual mortals able to add unit tests ?

cryptexis1 year ago

Examples Training with Random Masking

Inference with Square Mask (as before)

Prompt: a drawing of a green pokemon with red eyes

pre-trained stable-diffusion-inpainting

fine-tuned stable-diffusion-inpainting

pre-trained stable-diffusion-v1-5

fine-tuned stable-diffusion-v1-5 (no inpainting)

fine-tuned stable-diffusion-v1-5 (inpainting)

Inference with Random Mask

pre-trained stable-diffusion-inpainting

fine-tuned stable-diffusion-inpainting

pre-trained stable-diffusion-v1-5

fine-tuned stable-diffusion-v1-5 (no inpainting)

fine-tuned stable-diffusion-v1-5 (inpainting)

wip: last inference step with log_validation

94d877cb

Sanster1 year ago👍 1👀 1

@cryptexis Thank you for providing the scripts and test cases. I want to train a inpainting model specifically for object removal based on the sd1.5-inpainting model, The goal of this model is to be able to remove objects without using a prompt, just like the ldm-inpainting model. Although the sd1.5-inpainting model can achieve decent results with the appropriate prompts (#973), it is often not easy to find the appropriate prompts, and it's easy to add extra objects.

Here's my plan right now:

I will not modify the StableDiffusionInpaintPipeline code, all prompts used during training are blank strings
The mask generation strategy will use methods from CM-GAN-Inpainting which is better than LaMA for inpainting. First use a segmentation model to process the images to obtain object masks. Then, randomly generated masks will never completely cover an object (for example, using 50% IOU as a threshold).

The generated mask looks like this:

I have not trained diffusion models before, any suggestions would be very helpful to me, thank you.

sayakpaul commented on 2024-03-03

src/diffusers/loaders/single_file.py

51	51	torch_dtype=None,
52	52	**kwargs,
53	53	):
	54

sayakpaul1 year ago

Unrelated change?

cryptexis1 year ago

somehow came after ruff formatting ...hmmmm did not intent to commit

sayakpaul commented on 2024-03-03

Conversation is marked as resolved

Show resolved

sayakpaul commented on 2024-03-03

sayakpaul1 year ago

Looking good. I think the only that is pending now is the testing suite.

Merge branch 'main' into sd_15_inpainting

5532dea7

cryptexis1 year ago

Looking good. I think the only that is pending now is the testing suite.

@sayakpaul worked yesterday on the tests. Hit a wall. Then tried to run tests for the text_to_image and hit the same wall:

attaching the screenshot:

Was wondering if it is a systematic issue across all tests....

sayakpaul1 year ago❤ 1

@sayakpaul worked yesterday on the tests. Hit a wall. Then tried to run tests for the text_to_image and hit the same wall:

Had it been the case, it would have been caught in the CI. The CI doesn't indicate so. Feel free to push the tests and then we can work towards fixing them. WDYT?

BTW, for fixing the code quality issues, we need to run make style && make quality from the root of diffusers.

wip: fixing log_validation, tests

5dd28bd4

Merge branch 'sd_15_inpainting' of github.com:cryptexis/diffusers int…

235655fb

cryptexis1 year ago

@sayakpaul worked yesterday on the tests. Hit a wall. Then tried to run tests for the text_to_image and hit the same wall:

Had it been the case, it would have been caught in the CI. The CI doesn't indicate so. Feel free to push the tests and then we can work towards fixing them. WDYT?

BTW, for fixing the code quality issues, we need to run make style && make quality from the root of diffusers.

Done @sayakpaul , I think everything is addressed, tests are pushed. Thanks a lot for the patience, support and all the help!

run quality

5179539e

crapthings1 year ago👍 1

How to prepare dataset?

image
mask
prompt

Merge branch 'main' into sd_15_inpainting

2d075742

sayakpaul1 year ago

@cryptexis let's fix the example tests that are failing now.

Srinivasa-N7071 year ago👍 2

can anyone share script of sdxl inpainting fine tuning?

patil-suraj approved these changes on 2024-03-11

patil-suraj1 year ago

Thanks a lot for working on this, the script looks great! Just left some nits.

For the runwayml inpainting model, during training they mask the whole image 25% of the time. Have you experimented with that ?

Conversation is marked as resolved

Show resolved

examples/inpainting/requirements.txt

	5		ftfy
	6		tensorboard
	7		Jinja2
	8		peft==0.7.0

patil-suraj1 year ago

do we need peft for this example ?

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

examples/inpainting/train_inpainting.py

	99		return torch.from_numpy(mask[None, ...]).squeeze(0).byte()
	100
	101
	102		class RandomIrregularMaskGenerator:
	103		"""
	104		Initializes the RandomIrregularMaskGenerator with the provided parameters.
	105
	106		Parameters:
	107		max_angle (int): The maximum angle for the line segments, influencing the irregularity of the shapes.
	108		max_len (int): The maximum length for each line segment, affecting the size of the irregular shapes.
	109		max_width (int): The maximum width for each line segment, determining the thickness of the irregular shapes.
	110		min_times (int): The minimum number of irregular shapes to be generated on the mask.
	111		max_times (int): The maximum number of irregular shapes to be generated on the mask.
	112		"""
	113
	114		def __init__(self, max_angle, max_len, max_width, min_times, max_times):
	115		self.max_angle = max_angle
	116		self.max_len = max_len
	117		self.max_width = max_width
	118		self.min_times = min_times
	119		self.max_times = max_times
	120
	121		def __call__(self, img_shape):
	122		"""
	123		Generates a mask with random irregular shapes when called with an image.
	124
	125		Parameters:
	126		img (tuple): Tuple of image dimensions, excluding channels.
	127
	128		Returns:
	129		np.array: A mask array with the same height and width as the input image, containing random irregular shapes.
	130		"""
	131		cur_max_len = int(max(1, self.max_len))
	132		cur_max_width = int(max(1, self.max_width))
	133		cur_max_times = int(self.min_times + 1 + (self.max_times - self.min_times))
	134		return make_random_irregular_mask(
	135		img_shape,
	136		max_angle=self.max_angle,
	137		max_len=cur_max_len,
	138		max_width=cur_max_width,
	139		min_times=self.min_times,
	140		max_times=cur_max_times,
	141		)
	142
	143
	144		class RandomRectangleMaskGenerator:
	145		"""
	146		A generator class for creating masks with random rectangular shapes on images.
	147		The rectangles are defined within specified constraints for margins, size, and the number of times they appear.
	148
	149		Attributes:
	150		margin (int): The minimum distance between the rectangle edges and the image boundaries.
	151		bbox_min_size (int): The minimum size for the width and height of the rectangles.
	152		bbox_max_size (int): The maximum size for the width and height of the rectangles.
	153		min_times (int): The minimum number of rectangles to be generated on the mask.

patil-suraj1 year ago

Very cool!

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

github-actions1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions added stale

cs-mshah1 year ago

When is this getting merged?

yiyixuxu removed stale

Update examples/inpainting/README.md

8f33ed19

Update examples/inpainting/train_inpainting.py

f2b04e32

Update examples/inpainting/train_inpainting.py

7dc6bfb3

Update examples/inpainting/train_inpainting.py

d11619a2

yiyixuxu1 year ago (edited 1 year ago)

@cryptexis
can you

address the final comments here #6922 (comment) - if peft is not used we can remove it; otherwise we are all good
make sure the tests pass

will merge once the tests pass!

zijinY1 year ago

@Sanster Thanks for your plan, I also want to finetune an stable difffusion inpainting model for object removal. Have you tried this, how is the performance?

github-actions233 days ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions added stale

fire232384 days ago

Hi patil-suraj @patil-suraj , appreciated for the convenient script ! Is there any code example and dataset example to run the script: https://github.com/huggingface/diffusers/blob/inpainting-script/examples/inpainting/train_inpainting_sdxl.py ?

github-actions removed stale

github-actions58 days ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions added stale

	65		}
	66
	67
	68		def save_model_card(

	622
	623		return [deepspeed_plugin.zero3_init_context_manager(enable=False)]
	624
	625		# Currently Accelerate doesn't know how to handle multiple models under Deepspeed ZeRO stage 3.
	626		# For this to work properly all models must be run through `accelerate.prepare`. But accelerate
	627		# will try to assign the same optimizer with the same weights to all models during
	628		# `deepspeed.initialize`, which of course doesn't work.
	629		#
	630		# For now the following workaround will partially support Deepspeed ZeRO-3, by excluding the 2
	631		# frozen models from being partitioned during `zero.Init` which gets called during
	632		# `from_pretrained` So CLIPTextModel and AutoencoderKL will not enjoy the parameter sharding
	633		# across multiple gpus and only UNet2DConditionModel will get ZeRO sharded.
	634		with ContextManagers(deepspeed_zero_init_disabled_context_manager()):

	56		___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2-inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) 768x768 model.___
	57		<!-- accelerate_snippet_start -->
	58		```bash
	59		export MODEL_NAME="runwayml/stable-diffusion-inpainting"

	383
	384		def parse_args():
	385		parser = argparse.ArgumentParser(description="Simple example of a training script.")
	386		parser.add_argument(
	387		"--input_perturbation", type=float, default=0, help="The scale of input perturbation. Recommended 0.1."
	388		)

	804		unet.register_to_config(in_channels=in_channels)
	805
	806		with torch.no_grad():
	807		new_conv_in = torch.nn.Conv2d(
	808		in_channels, out_channels, unet.conv_in.kernel_size, unet.conv_in.stride, unet.conv_in.padding
	809		)
	810		new_conv_in.weight.zero_()
	811		new_conv_in.weight[:, :4, :, :].copy_(unet.conv_in.weight)
	812		unet.conv_in = new_conv_in

diffusers
Stable-Diffusion-Inpainting: Training Pipeline V1.5, V2
#6922

Open

Stable-Diffusion-Inpainting: Training Pipeline V1.5, V2 #6922

What does this PR do?

Before submitting

Who can review?

Examples Out of Training Distribution Scenery:

Prompt: a drawing of a green pokemon with red eyes

Pre-trained

Fine-tuned

Prompt: a green and yellow toy with a red nose

Pre-trained

Fine-tuned

Prompt: a red and white ball with an angry look on its face

Pre-trained

Fine-tuned

Examples Training with Random Masking

Inference with Square Mask (as before)

Prompt: a drawing of a green pokemon with red eyes

pre-trained stable-diffusion-inpainting

fine-tuned stable-diffusion-inpainting

pre-trained stable-diffusion-v1-5

fine-tuned stable-diffusion-v1-5 (no inpainting)

fine-tuned stable-diffusion-v1-5 (inpainting)

Inference with Random Mask

pre-trained stable-diffusion-inpainting

fine-tuned stable-diffusion-inpainting

pre-trained stable-diffusion-v1-5

fine-tuned stable-diffusion-v1-5 (no inpainting)

fine-tuned stable-diffusion-v1-5 (inpainting)

	1314		# Run a final round of inference.
	1315		if args.validation_size > 0:
	1316		logger.info("Running inference for collecting generated images...")
	1317		images, prompts, masks = log_validation(
	1318		val_dataloader,
	1319		vae,
	1320		text_encoder,
	1321		tokenizer,
	1322		unet,
	1323		args,
	1324		accelerator,
	1325		weight_dtype,
	1326		global_step,

	1		# Stable Diffusion Inpainting fine-tuning
	2
	3		The `train_inpainting.py` script shows how to fine-tune stable diffusion model on your own dataset.

	1		#!/usr/bin/env python
	2		# coding=utf-8
	3		# Copyright 2023 The HuggingFace Inc. team. All rights reserved.

	# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
	# Copyright 2024 The HuggingFace Inc. team. All rights reserved.

	801		args.pretrained_model_name_or_path, subfolder="unet", revision=args.non_ema_revision
	802		)
	803
	804		# InstructPix2Pix uses an additional image for conditioning. To accommodate that,
	805		# it uses 8 channels (instead of 4) in the first (conv) layer of the UNet. This UNet is
	806		# then fine-tuned on the custom InstructPix2Pix dataset. This modified UNet is initialized
	807		# from the pre-trained checkpoints. For the extra channels added to the first layer, they are
	808		# initialized to zero.

-    # InstructPix2Pix uses an additional image for conditioning. To accommodate that,
-    # it uses 8 channels (instead of 4) in the first (conv) layer of the UNet. This UNet is
-    # then fine-tuned on the custom InstructPix2Pix dataset. This modified UNet is initialized
-    # from the pre-trained checkpoints. For the extra channels added to the first layer, they are
-    # initialized to zero.
+    # For inpainting an additional image is used for conditioning. To accommodate that,
+    # it uses 8 channels (instead of 4) in the first (conv) layer of the UNet. This UNet is
+    # then fine-tuned on the custom inpainting dataset. This modified UNet is initialized
+    # from the pre-trained checkpoints. For the extra channels added to the first layer, they are
+    # initialized to zero.

	807		# from the pre-trained checkpoints. For the extra channels added to the first layer, they are
	808		# initialized to zero.
	809
	810		# when most likely a text2img pretrained model is used

	74		repo_folder=None,
	75		):
	76		img_str = ""
	77		if len(images) > 0:

diffusers Stable-Diffusion-Inpainting: Training Pipeline V1.5, V2 #6922 Open

Stable-Diffusion-Inpainting: Training Pipeline V1.5, V2 #6922

What does this PR do?

Before submitting

Who can review?

Examples Out of Training Distribution Scenery:

Prompt: a drawing of a green pokemon with red eyes

Pre-trained

Fine-tuned

Prompt: a green and yellow toy with a red nose

Pre-trained

Fine-tuned

Prompt: a red and white ball with an angry look on its face

Pre-trained

Fine-tuned

Examples Training with Random Masking

Inference with Square Mask (as before)

Prompt: a drawing of a green pokemon with red eyes

pre-trained stable-diffusion-inpainting

fine-tuned stable-diffusion-inpainting

pre-trained stable-diffusion-v1-5

fine-tuned stable-diffusion-v1-5 (no inpainting)

fine-tuned stable-diffusion-v1-5 (inpainting)

Inference with Random Mask

pre-trained stable-diffusion-inpainting

fine-tuned stable-diffusion-inpainting

pre-trained stable-diffusion-v1-5

fine-tuned stable-diffusion-v1-5 (no inpainting)

fine-tuned stable-diffusion-v1-5 (inpainting)

diffusers
Stable-Diffusion-Inpainting: Training Pipeline V1.5, V2
#6922

Open