diffusers
[examples] add train flux-controlnet scripts in example.
#9324
Merged

[examples] add train flux-controlnet scripts in example. #9324

PromeAIpro
PromeAIpro261 days ago❤ 12🚀 1

What does this PR do?

In this commit we add train flux-controlnet scripts in examples, and tested it on A100-SXM4-80GB.

Using this train script, We can customize the number of layers of the transformer, by setting --num_double_layers=4 --num_single_layers=0 , by this setting, the GPU memory demand is 60G, with batchsize 2, and 512 resolution.

discussed in #9085

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

PromeAIpro add train flux-controlnet scripts in example.
8ab9b5b0
PromeAIpro fix error
4a535737
Mason-McGough
Mason-McGough commented on 2024-08-31
Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
862 model = models.pop()
863
864 # load diffusers style into model
865
load_model = FluxControlNetModel.from_pretrained(input_dir, subfolder="controlnet")
Mason-McGough259 days ago

Should subfolder here be the same name used for sub_dir inside save_model_hook? Seems they are different.

PromeAIpro258 days ago👍 1

Thanks, fixed

PromeAIpro fix subfolder error
14e9970e
yiyixuxu Merge branch 'main' into flux-controlnet-train
3bb431c4
yiyixuxu yiyixuxu requested a review from sayakpaul sayakpaul 256 days ago
yiyixuxu
yiyixuxu256 days ago

@haofanwang @wangqixun
would you be willing to give this a review if you have time?

HuggingFaceDocBuilderDev
HuggingFaceDocBuilderDev256 days ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

PromeAIpro fix preprocess error
973c6fb1
PromeAIpro Merge branch 'flux-controlnet-train_x' into flux-controlnet-train
599c984f
PromeAIpro Merge branch 'main' into flux-controlnet-train
24b58f85
PromeAIpro Merge branch 'main' into flux-controlnet-train
22a3e101
linjiapro
linjiapro248 days ago (edited 248 days ago)

@PromeAIpro

Can we have some sample training results (such as images) from this script attached in the doc, or anywhere suitable?

PromeAIpro
PromeAIpro247 days ago (edited 241 days ago)👍 2

Here are some training results by lineart controlnet.

input output prompt
ComfyUI_temp_egnkb_00001_ ComfyUI_00027_ cute anime girl with massive fluffy fennec ears and a big fluffy tail blonde messy long hair blue eyes wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere
ComfyUI_temp_znagh_00001_ ComfyUI_temp_cufps_00002_ a busy urban intersection during daytime. The sky is partly cloudy with a mix of blue and white clouds. There are multiple traffic lights, and vehicles are seen waiting at the red signals. Several businesses and shops are visible on the side, with signboards and advertits. The road is wide, and there are pedestrian crossings. Overall, it appears to be a typical day in a bustling city.

First train on 512res and then fine-tune with 1024res

PromeAIpro Merge branch 'main' into flux-controlnet-train
32eb1ef4
sayakpaul
sayakpaul commented on 2024-09-13
Conversation is marked as resolved
Show resolved
examples/controlnet/README_flux.md
2
3The `train_controlnet_flux.py` script shows how to implement the ControlNet training procedure and adapt it for [FLUX](https://github.com/black-forest-labs/flux).
4
5
Training script provided by LibAI, which is an institution dedicated to the progress and achievement of artificial general intelligence.LibAI is the developer of [cutout.pro](https://www.cutout.pro/) and [promeai.pro](https://www.promeai.pro/).
sayakpaul247 days ago
Suggested change
Training script provided by LibAI, which is an institution dedicated to the progress and achievement of artificial general intelligence.LibAI is the developer of [cutout.pro](https://www.cutout.pro/) and [promeai.pro](https://www.promeai.pro/).
Training script provided by LibAI, which is an institution dedicated to the progress and achievement of artificial general intelligence. LibAI is the developer of [cutout.pro](https://www.cutout.pro/) and [promeai.pro](https://www.promeai.pro/).
sayakpaul
sayakpaul commented on 2024-09-13
examples/controlnet/README_flux.md
103* `report_to="tensorboard` will ensure the training runs are tracked on Weights and Biases.
104* `validation_image`, `validation_prompt`, and `validation_steps` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.
105
106
Our experiments were conducted on a single 40GB A100 GPU.
sayakpaul247 days ago

Wow, 40GB A100 seems doable.

PromeAIpro246 days ago

I'm sorry, this is the 80g A100 (I wrote it wrong), I did a lot of extra work to get it to train with the zero3 on the 40g A100, but I don't think this is suitable for everyone

sayakpaul246 days ago

Not at all. I think it would still be nice to include the changes you had to make in the form of notes in the README. Does that work?

PromeAIpro246 days ago

I'll see if I can add it later.

PromeAIpro246 days ago❤ 1

@sayakpaul We added a tutorial on configuring deepspeed in the readme.

linjiapro246 days ago

There are some tricks to lower GPU:

  1. gradient_checkpointing
  2. bf16 or fp16.
  3. batch size 1, and then use gradient_accumulation_steps above 1

With 1, 2, 3, can this thing be controlled to be trained under 40GB?

PromeAIpro246 days ago (edited 246 days ago)👍 3

According to my practice, deepspeedzero3 must be used, @linjiapro your settings will cost about 70g when 1024 with bs 1 or 512 with bs 3.

ghost242 days ago

sorry to bother you, have you ever tried cache text-encoder and vae latents to run with lower GPU? @PromeAIpro @linjiapro

PromeAIpro242 days ago👍 1

cache text-encoder is already available in this script (saving about 10g of gpu memory on T5), about cache vae You can check how to use deepspeed in the readme, which includes cache vae.

christopher-beckham239 days ago

fyi you can also reduce memory usage by using optimum-quanto and qint8 quantising all of the modules except the controlnet (not activation quantisation, just the weights). I ran some experiments on this with my own controlnet training script and it seems to work just fine.

sayakpaul
sayakpaul commented on 2024-09-13
Conversation is marked as resolved
Show resolved
examples/controlnet/README_flux.md
115from diffusers.pipelines.flux.pipeline_flux_controlnet import FluxControlNetPipeline
116from diffusers.models.controlnet_flux import FluxControlNetModel
117
118
base_model = 'black-forest-labs/FLUX.1-dev'
119
controlnet_model = 'path to controlnet'
120
controlnet = FluxControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16)
121
pipe = FluxControlNetPipeline.from_pretrained(base_model,
122
controlnet=controlnet,
123
torch_dtype=torch.bfloat16)
124
pipe.to("cuda")
sayakpaul247 days ago
Suggested change
base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model = 'path to controlnet'
controlnet = FluxControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16)
pipe = FluxControlNetPipeline.from_pretrained(base_model,
controlnet=controlnet,
torch_dtype=torch.bfloat16)
pipe.to("cuda")
base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model = 'path to controlnet'
controlnet = FluxControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16)
pipe = FluxControlNetPipeline.from_pretrained(
base_model,
controlnet=controlnet,
torch_dtype=torch.bfloat16
)
# enable memory optimizations
pipe.enable_model_cpu_offload()

Most people may not have the necessary VRAM to run it like this. So, better have it this way? WDYT?

PromeAIpro246 days ago

yes, you are right

sayakpaul
sayakpaul commented on 2024-09-13
Conversation is marked as resolved
Show resolved
examples/controlnet/README_flux.md
138
139## Notes
140
141
### T5 dont support bf16 autocast and i dont know why, will cause black image.
142
143
```diff
144
if is_final_validation or torch.backends.mps.is_available():
145
autocast_ctx = nullcontext()
146
else:
147
# t5 seems not support autocast and i don't know why
148
+ autocast_ctx = nullcontext()
149
- autocast_ctx = torch.autocast(accelerator.device.type)
150
```
sayakpaul247 days ago

Instead of this how about we directly incorporate the fix in the training script itself?

PromeAIpro246 days ago

done, plz check it.

sayakpaul
sayakpaul commented on 2024-09-13
Conversation is marked as resolved
Show resolved
examples/controlnet/README_flux.md
149- autocast_ctx = torch.autocast(accelerator.device.type)
150```
151
152
### TO Fix Error
sayakpaul247 days ago

Same, let's fix this in the pipeline implementation itself.

PromeAIpro246 days ago

done, plz check it.

sayakpaul
sayakpaul commented on 2024-09-13
sayakpaul247 days ago

Hi, thanks for your PR. I just left some initial comments. LMK what you think.

PromeAIpro Update examples/controlnet/README_flux.md
57d143bb
PromeAIpro Update examples/controlnet/README_flux.md
af1b7a50
fix readme
d19b101c
fix note error
64251ac5
add some Tutorial for deepspeed
c98d43f8
fix some Format Error
569e0de8
PromeAIpro Merge branch 'main' into flux-controlnet-train
916fd80a
sayakpaul
sayakpaul commented on 2024-09-14
sayakpaul246 days ago

Thanks! Appreciate your hard work here. Left some more comments.

Conversation is marked as resolved
Show resolved
examples/controlnet/README_flux.md
71
72```bash
73export MODEL_DIR="black-forest-labs/FLUX.1-dev"
74
export OUTPUT_DIR="path to save model"
75
export TRAIN_JSON_FILE="path to your jsonl file"
sayakpaul246 days ago

I think we would want to rather use dataset_path here like we do in the other examples. Do you think it is possible?

If this is not possible, I think we should at least provide an example link that users could download, unzip, and get started with training.

PromeAIpro245 days ago

added in this commit Plz take a look at it.

sayakpaul245 days ago

We should then change this command accordingly to reflect it no? Specifically, we should remove export TRAIN_JSON_FILE="path to your jsonl file".

Conversation is marked as resolved
Show resolved
src/diffusers/pipelines/flux/pipeline_flux_controlnet.py
871871 encoder_hidden_states=prompt_embeds,
872 controlnet_block_samples=controlnet_block_samples,
873 controlnet_single_block_samples=controlnet_single_block_samples,
872
controlnet_block_samples=[sample.to(dtype=latents.dtype) for sample in controlnet_block_samples]if controlnet_block_samples is not None else None,
873
controlnet_single_block_samples=[sample.to(dtype=latents.dtype) for sample in controlnet_single_block_samples] if controlnet_single_block_samples is not None else None,
sayakpaul246 days ago

We could probably tackle this before self.transformer call to keep it a bit more readable.

@yiyixuxu okay with you regarding the changes?

PromeAIpro245 days ago

added in this commit Plz take a look at it.

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
1297 accelerator.wait_for_everyone()
1298 if accelerator.is_main_process:
1299 flux_controlnet = unwrap_model(flux_controlnet)
1300
flux_controlnet.save_pretrained(args.output_dir)
sayakpaul246 days ago

Should we save it in the weight_dtype? Saving it in the FP32 precision could be huge. Of course, we could have a CLI option like upcast_before_saving should the users want to do it. WDYT?

PromeAIpro245 days ago

For controlnet, it should always be trained in fp32 format. Do you mean to give users a CLI (like save_weight_dtype?)to save the weight data format? Sorry, I don't quite understand what you mean. Maybe you can provide some example code.

sayakpaul245 days ago

Yes. save_weight_dtype works for me.

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
1197 ) * noise
1198
1199 guidance_vec = torch.full(
1200
(noisy_latents.shape[0],), 3.5, device=noisy_latents.device, dtype=weight_dtype
sayakpaul246 days ago

Let's make 3.5 a CLI argument guidance_scale.

PromeAIpro245 days ago

added in this commit Plz take a look at it.

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
1186 noise = torch.randn_like(pixel_latents).to(accelerator.device).to(dtype=weight_dtype)
1187 bsz = pixel_latents.shape[0]
1188
1189
# Sample a random timestep for each image
sayakpaul246 days ago

This seems like the sampling from ai-toolkit:
https://github.com/ostris/ai-toolkit/blob/9ee1ef2a0a2a9a02b92d114a95f21312e5906e54/toolkit/samplers/custom_flowmatch_sampler.py#L95

Perhaps we could give a citation here? Or am I mistaken?

sayakpaul246 days ago

Or better yet, we let the users configure this and we provide a reasonable default like we do here:
https://github.com/huggingface/diffusers/blob/48e36353d8cbf0322ec1ad0684b95d11f70af2de/examples/dreambooth/train_dreambooth_lora_flux.py#L1634C21-L1634C58

If we need to update

def compute_density_for_timestep_sampling(

to accommodate this weighting scheme, I am okay with that. Cc: @linoytsaban

PromeAIpro245 days ago

All the solutions are developed by ourselves, not copied from anyone else. I am not familiar with the solution you provided. I can read the code or you can provide it.

sayakpaul245 days ago

Oh okay.

Usually, we let our users choose a weighting scheme for the timesteps as I showed above in https://github.com/huggingface/diffusers/blob/48e36353d8cbf0322ec1ad0684b95d11f70af2de/examples/dreambooth/train_dreambooth_lora_flux.py#L1634C21-L1634C58

LMK if that makes sense.

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
1191
1192 # apply flow matching
1193 noisy_latents = (
1194
1 - t.unsqueeze(1).unsqueeze(2).repeat(1, pixel_latents.shape[1], pixel_latents.shape[2])
sayakpaul246 days ago

This feels a little too overwhelming. Could we match how we do it here?

noisy_model_input = (1.0 - sigmas) * model_input + sigmas * noise

PromeAIpro245 days ago

you mean remove the dim-expanding and repeating ops? i think it's better to write straight out those ops so make code readable and let reader know what the tensor shape is.

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
1203 controlnet_block_samples, controlnet_single_block_samples = flux_controlnet(
1204 hidden_states=noisy_latents,
1205 controlnet_cond=control_image,
1206
timestep=t,
sayakpaul246 days ago

Should we not scale it like

?

PromeAIpro245 days ago

t was sampled through

t = torch.sigmoid(torch.randn((bsz,), device=accelerator.device, dtype=weight_dtype))

and it located between 0,1 and no need to scale.

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
1214
1215 noise_pred = flux_transformer(
1216 hidden_states=noisy_latents,
1217
timestep=t,
sayakpaul246 days ago

Scaling here as well?

PromeAIpro245 days ago

t was sampled through

t = torch.sigmoid(torch.randn((bsz,), device=accelerator.device, dtype=weight_dtype))

and it located between 0,1 and no need to scale.

sayakpaul
sayakpaul245 days ago

Can we fix the code quality issues? make quality && make style?

add dataset_path example
67deb7a6
Merge branch 'flux-controlnet-train' of https://github.com/PromeAIpro…
76bcf5a1
remove print, add guidance_scale CLI, readable apply
32fbeac2
sayakpaul
sayakpaul commented on 2024-09-15
Conversation is marked as resolved
Show resolved
examples/controlnet/README_flux.md
57{"image": "xxx", "text": "xxx", "conditioning_image": "xxx"}
58{"image": "xxx", "text": "xxx", "conditioning_image": "xxx"}
59```
60
61
62
63
64
sayakpaul245 days ago

We can remove this whitespace.

Laidawang244 days ago

fixed

sayakpaul
sayakpaul commented on 2024-09-15
sayakpaul245 days ago

Thank you! Left some more comments. Let me know if they make sense or are unclear.

Conversation is marked as resolved
Show resolved
examples/controlnet/README_flux.md
49## Custom Datasets
50
51We support dataset formats:
52
The original dataset is hosted in the [ControlNet repo](https://huggingface.co/lllyasviel/ControlNet/blob/main/training/fill50k.zip). We re-uploaded it to be compatible with `datasets` [here](https://huggingface.co/datasets/fusing/fill50k). Note that `datasets` handles dataloading within the training script, To use our example, add `--dataset_name=fusing/fill50k \` to the script and remove line `--jsonl_for_train` mentioned below.
sayakpaul245 days ago
Suggested change
The original dataset is hosted in the [ControlNet repo](https://huggingface.co/lllyasviel/ControlNet/blob/main/training/fill50k.zip). We re-uploaded it to be compatible with `datasets` [here](https://huggingface.co/datasets/fusing/fill50k). Note that `datasets` handles dataloading within the training script, To use our example, add `--dataset_name=fusing/fill50k \` to the script and remove line `--jsonl_for_train` mentioned below.
The original dataset is hosted in the [ControlNet repo](https://huggingface.co/lllyasviel/ControlNet/blob/main/training/fill50k.zip). We re-uploaded it to be compatible with `datasets` [here](https://huggingface.co/datasets/fusing/fill50k). Note that `datasets` handles dataloading within the training script. To use our example, add `--dataset_name=fusing/fill50k \` to the script and remove line `--jsonl_for_train` mentioned below.
Conversation is marked as resolved
Show resolved
examples/controlnet/README_flux.md
100 --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
101 --train_batch_size=1 \
102 --gradient_accumulation_steps=4 \
103
--report_to="tensorboard" \
sayakpaul245 days ago

We usually suggest logging to wandb.

Laidawang244 days ago

fixed

Conversation is marked as resolved
Show resolved
examples/controlnet/README_flux.md
103 --report_to="tensorboard" \
104 --num_double_layers=4 \
105 --num_single_layers=0 \
106
--seed=42 \
sayakpaul245 days ago
Suggested change
--seed=42 \
--seed=42 \
--push_to_hub

We usually default to push_to_hub in our examples.

Laidawang244 days ago

fixed

Conversation is marked as resolved
Show resolved
examples/controlnet/README_flux.md
108
109To better track our training experiments, we're using the following flags in the command above:
110
111
* `report_to="tensorboard` will ensure the training runs are tracked on Weights and Biases.
sayakpaul245 days ago

No, report_to="wandb" will ensure that.

Laidawang244 days ago

fixed

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
119 "number of `args.validation_image` and `args.validation_prompt` should be checked in `parse_args`"
120 )
121
122
image_logs = []
123
if is_final_validation or torch.backends.mps.is_available():
124
autocast_ctx = nullcontext()
125
else:
126
# t5 seems not support autocast and i don't know why
127
autocast_ctx = nullcontext()
128
# autocast_ctx = torch.autocast(accelerator.device.type)
sayakpaul245 days ago

If it's always going to be nullcontext, we don't need to add it at all, no?

sayakpaul241 days ago

@PromeAIpro seems like this went unnoticed.

PromeAIpro241 days ago

yes, i missed it, It has now been modified.

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
191 else:
192 logger.warning(f"image logging not implemented for {tracker.name}")
193
194
del pipeline
195
gc.collect()
196
torch.cuda.empty_cache()
sayakpaul241 days ago
PromeAIpro241 days ago

missed and fixed

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
719
720
721def main(args):
722
# if args.report_to == "wandb" and args.hub_token is not None:
723
# raise ValueError(
724
# "You cannot use both --report_to=wandb and --hub_token due to a security risk of exposing your token."
725
# " Please use `huggingface-cli login` to authenticate with the Hub."
726
# )
sayakpaul245 days ago

Why?

PromeAIpro241 days ago

Already added again

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
980 flux_transformer.to(accelerator.device, dtype=weight_dtype)
981 text_encoder_one.to(accelerator.device, dtype=weight_dtype)
982 text_encoder_two.to(accelerator.device, dtype=weight_dtype)
983
# flux_controlnet.to(accelerator.device, dtype=weight_dtype)
sayakpaul245 days ago

Could be removed.

sayakpaul241 days ago
PromeAIpro241 days ago

cant remove line 980, will cause dtype error. There are three remaining lines that can be removed

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
1025 compute_embeddings_fn, batched=True, new_fingerprint=new_fingerprint, batch_size=100
1026 )
1027
1028
del text_encoders, tokenizers
1029
gc.collect()
1030
torch.cuda.empty_cache()
1031
Laidawang244 days ago

fixed

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
1138 )
1139
1140 # copied from pipeline_flux_controlnet
1141
def _prepare_latent_image_ids(batch_size, height, width, device, dtype):
1142
latent_image_ids = torch.zeros(height // 2, width // 2, 3)
1143
latent_image_ids[..., 1] = latent_image_ids[..., 1] + torch.arange(height // 2)[:, None]
1144
latent_image_ids[..., 2] = latent_image_ids[..., 2] + torch.arange(width // 2)[None, :]
1145
1146
latent_image_id_height, latent_image_id_width, latent_image_id_channels = latent_image_ids.shape
1147
1148
latent_image_ids = latent_image_ids[None, :].repeat(batch_size, 1, 1, 1)
1149
latent_image_ids = latent_image_ids.reshape(
1150
batch_size, latent_image_id_height * latent_image_id_width, latent_image_id_channels
1151
)
1152
1153
return latent_image_ids.to(device=device, dtype=dtype)
1154
1155
def _pack_latents(latents, batch_size, num_channels_latents, height, width):
1156
latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
1157
latents = latents.permute(0, 2, 4, 1, 3, 5)
1158
latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4)
1159
1160
return latents
sayakpaul245 days ago

These are static methods so, we could directly do: FluxControlNetPipeline._prepare_latent_image_ids() and FluxControlNetPipeline._pack_latents(). WDYT?

Laidawang244 days ago

fixed

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
1327 step=global_step,
1328 is_final_validation=True,
1329 )
1330
accelerator.end_training()
sayakpaul245 days ago

Ah I see, we don't have a push_to_hub() option. That is a must in our scripts:

Laidawang244 days ago

add save_weight_dtype and push_to_hub, about weighting scheme let me test, i will try to support if it works well.

Laidawang243 days ago

supported

PromeAIpro Update examples/controlnet/README_flux.md
b03cb01c
PromeAIpro Merge branch 'main' into flux-controlnet-train
7b984595
update,push_to_hub,save_weight_dtype,static method,clear_objs_and_ret…
443f251f
add push to hub in readme
bc68f1a7
sayakpaul
sayakpaul commented on 2024-09-16
sayakpaul244 days ago

Left some additional minor comments but I see existing comments are yet to be addressed. Let me know when you would like another round of review.

Conversation is marked as resolved
Show resolved
examples/controlnet/README_flux.md
3The `train_controlnet_flux.py` script shows how to implement the ControlNet training procedure and adapt it for [FLUX](https://github.com/black-forest-labs/flux).
4
5Training script provided by LibAI, which is an institution dedicated to the progress and achievement of artificial general intelligence. LibAI is the developer of [cutout.pro](https://www.cutout.pro/) and [promeai.pro](https://www.promeai.pro/).
6
sayakpaul244 days ago
Suggested change
> [!NOTE]
> **Memory consumption**
>
> Flux can be quite expensive to run on consumer hardware devices and as a result, ControlNet training of it comes with higher memory requirements than usual.
> **Gated access**
> As the model is gated, before using it with diffusers you first need to go to the [FLUX.1 [dev] Hugging Face page](https://huggingface.co/black-forest-labs/FLUX.1-dev), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate. Use the command below to log in: `huggingface-cli login`
Conversation is marked as resolved
Show resolved
examples/controlnet/README_flux.md
115from diffusers.models.controlnet_flux import FluxControlNetModel
116
117base_model = 'black-forest-labs/FLUX.1-dev'
118
controlnet_model = 'path to controlnet'
sayakpaul244 days ago

Let's provide the repo id of an already trained model (on our conditioning dataset) here if possible?

Laidawang244 days ago

I don't have such a model yet, maybe consider providing 'InstantX/FLUX.1-dev-Controlnet-Canny'

sayakpaul241 days ago

But you ran experiments with the script as I understand it right? Could we not provide something that worked in those experiments?

PromeAIpro241 days ago

I don't have enough GPU resources to train and test on our conditional dataset. Our organization is open sourcing a dev_lineart controlnet model(The example we proposed above) trained on this script. Is that ok?

sayakpaul241 days ago

Of course that would be okay!

sayakpaul241 days ago

@PromeAIpro let's update this then?

PromeAIpro240 days ago
Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
223 model_card = load_or_create_model_card(
224 repo_id_or_path=repo_id,
225 from_training=True,
226
license="openrail++",
sayakpaul244 days ago
Suggested change
license="openrail++",
license="other",
Laidawang243 days ago

fixed

Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
216 model_description = f"""
217# controlnet-{repo_id}
218
219
These are controlnet weights trained on {base_model} with new type of conditioning.
220
{img_str}
sayakpaul244 days ago
Suggested change
These are controlnet weights trained on {base_model} with new type of conditioning.
{img_str}
These are controlnet weights trained on {base_model} with new type of conditioning.
{img_str}
## License
Please adhere to the licensing terms as described [here](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
Laidawang243 days ago

fixed

apply weighting schemes
fe2a5871
add note
3dc16cac
Laidawang
Laidawang243 days ago

@sayakpaul hey, I think I have fixed all the issues, time to start a new review.

PromeAIpro Update examples/controlnet/README_flux.md
aff09514
PromeAIpro Merge branch 'main' into flux-controlnet-train
b8585071
sayakpaul
sayakpaul commented on 2024-09-18
examples/controlnet/train_controlnet_flux.py
1254 bsz = pixel_latents.shape[0]
1255 noise = torch.randn_like(pixel_latents).to(accelerator.device).to(dtype=weight_dtype)
1256 # Sample a random timestep for each image
1257
# for weighting schemes where we sample timesteps non-uniformly
1258
u = compute_density_for_timestep_sampling(
1259
weighting_scheme=args.weighting_scheme,
1260
batch_size=bsz,
1261
logit_mean=args.logit_mean,
1262
logit_std=args.logit_std,
1263
mode_scale=args.mode_scale,
1264
)
1265
indices = (u * noise_scheduler_copy.config.num_train_timesteps).long()
1266
timesteps = noise_scheduler_copy.timesteps[indices].to(device=pixel_latents.device)
1267
1268
# Add noise according to flow matching.
1269
sigmas = get_sigmas(timesteps, n_dim=pixel_latents.ndim, dtype=pixel_latents.dtype)
1270
noisy_model_input = (1.0 - sigmas) * pixel_latents + sigmas * noise
sayakpaul241 days ago

I thought we were using a different timestep sampling procedure and I suggested to have that as a default. Are we not doing that anymore?

PromeAIpro241 days ago (edited 241 days ago)

Do you mean to set the original sampling scheme as default?
image
For the weighting schema i just copied from here.

sayakpaul241 days ago

Yeah I meant to keep the sigmoid sampling as your default and let users configure it as we do in the other scripts.

PromeAIpro241 days ago

Could you please write it down briefly? I'm not sure how to edit it. It seems to me that if you use logit_normal, you should be using sigmoid?
image

PromeAIpro241 days ago👍 1

Just need to change weighting_scheme from the default value to logit_normal?
image

sayakpaul241 days ago

Okay. But it depends on an std and mean. IIRC your scheme did torch.randn() and applied sigmoid right?

PromeAIpro240 days ago (edited 240 days ago)👍 1

Yes, this uses torch.randn() at first, but after given the examples you provided, I think this is maybe a better solution for us?

sayakpaul
sayakpaul commented on 2024-09-18
sayakpaul241 days ago

Left some comments but my concerns:

  • Why remove the previous timesteps computing scheme?
  • Let's provide a reasonable ControlNet checkpoint derived from your experiments.

LMK if anything is unclear.

PromeAIpro PromeAIpro closed this 241 days ago
sayakpaul
sayakpaul241 days ago

@PromeAIpro we didn't have to close this PR. Is there anything we could do to revive this PR? We could very much like to do that. Please let us know.

sayakpaul sayakpaul reopened this 241 days ago
PromeAIpro
PromeAIpro241 days ago

@PromeAIpro we didn't have to close this PR. Is there anything we could do to revive this PR? We could very much like to do that. Please let us know.

sry, i do it by mistake

make code style and quality
7bdf9e3b
Merge branch 'flux-controlnet-train' of https://github.com/PromeAIpro…
ba45495d
fix some unnoticed error
c862d393
make code style and quality
4b979e0b
sayakpaul
sayakpaul commented on 2024-09-19
sayakpaul241 days ago

Thanks. I think this is looking good. Some minor comments.

Also, we would need to add tests like in https://github.com/huggingface/diffusers/blob/main/examples/controlnet/test_controlnet.py.

@yiyixuxu could you review the changes made to the ControlNet pipeline?

sayakpaul Merge branch 'main' into flux-controlnet-train
0655a759
add example controlnet in readme
90badc29
Merge branch 'flux-controlnet-train' of https://github.com/PromeAIpro…
47555579
add test controlnet
e3d10bc1
rm Remove duplicate notes
f9400a6f
PromeAIpro Merge branch 'main' into flux-controlnet-train
192bbeea
Fix formatting errors
de06965c
Merge branch 'flux-controlnet-train' of https://github.com/PromeAIpro…
8ee2daf6
PromeAIpro
PromeAIpro240 days ago

Thanks. I think this is looking good. Some minor comments.

Also, we would need to add tests like in https://github.com/huggingface/diffusers/blob/main/examples/controlnet/test_controlnet.py.

@yiyixuxu could you review the changes made to the ControlNet pipeline?

added test in test_controlnet

sayakpaul
sayakpaul commented on 2024-09-20
Conversation is marked as resolved
Show resolved
examples/controlnet/README_flux.md
134# enable memory optimizations
135pipe.enable_model_cpu_offload()
136
137
control_image = load_image("./conditioning_image_1.png").resize((1024, 1024))
sayakpaul240 days ago

Should we not change the conditioning image because we're using "promeai/FLUX.1-controlnet-lineart-promeai"?

PromeAIpro240 days ago

In fact, each image can be used as a test, but for demonstration purposes, I have modified it.

add new control image
17fc1ee3
sayakpaul sayakpaul requested a review from yiyixuxu yiyixuxu 240 days ago
sayakpaul
sayakpaul240 days ago

@yiyixuxu could you review the changes made to the ControlNet Flux pipeline once you have a moment?

PromeAIpro Merge branch 'main' into flux-controlnet-train
213faf93
Night1099
Night1099238 days ago

@PromeAIpro Hi great work, can this also train on Flux Schnell, or only dev rn.

Mason-McGough
Mason-McGough237 days ago👍 2

@PromeAIpro Hi great work, can this also train on Flux Schnell, or only dev rn.

Training on Schnell seems to work but I had to set guidance=None during the forward pass.

ShunyuYao
ShunyuYao237 days ago (edited 237 days ago)

excellent job, but i have a question. I tried the scripts with 512 resolution, bf16, batch size 1 and it uses 76GB memory on a A800(80GB). And 1024 reso cannot be trained because of the memory. Any suggestions?
with following settings:

accelerate config:
debug: false
distributed_type: 'NO'
downcast_bf16: 'no'
enable_cpu_affinity: false
gpu_ids: '1'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

training script:
accelerate launch --main_process_port 29511 --config_file acc_config_singlegpu.yaml train_controlnet_flux.py
--pretrained_model_name_or_path="/home/export/base/ycsc_yaosy/yaosy/online1/models/black-forest-labs/FLUX.1-dev"
--jsonl_for_train="./controlnet_sdxl_train_5examples.jsonl"
--conditioning_image_column=conditioning_image
--image_column=image
--caption_column=text
--output_dir="./controlnet_example_512"
--mixed_precision="bf16"
--resolution=512
--learning_rate=1e-5
--max_train_steps=15000
--validation_steps=5
--checkpointing_steps=200
--validation_image "test.jpg"
--validation_prompt "..."
--train_batch_size=1
--gradient_accumulation_steps=4
--report_to="tensorboard"
--num_double_layers=4
--num_single_layers=0
--seed=42

PromeAIpro
PromeAIpro237 days ago (edited 237 days ago)

@ShunyuYao try to use --use_adafactor as a Optimizer maybe?also by using the latest code, you can use --enable_model_cpu_offload to run it in 1024res with AdamW.

Here are my setting(cause about 66g for training).Please delete the # comment when you use:

CUDA_VISIBLE_DEVICES=0 python ../train_controlnet_flux.py \
    --pretrained_model_name_or_path=$MODEL_DIR \
    --dataset_name=fusing/fill50k \
    --max_train_samples=100 \
    --conditioning_image_column=conditioning_image \
    --image_column=image \
    --caption_column=text \
    --output_dir=$OUTPUT_DIR \
    --mixed_precision="bf16" \
    --resolution=1024 \
    --learning_rate=1e-5 \
    --max_train_steps=10 \
    --checkpointing_steps=11 \
    --validation_steps=1 \
    --validation_image "./conditioning_image_1.png" \
    --validation_prompt "red circle with blue background" \
    --num_validation_images=1 \
    --train_batch_size=1 \
    --gradient_accumulation_steps=2 \
    --report_to="wandb" \
    --num_double_layers=4 \
    --num_single_layers=0 \
    --seed=42 \
    --save_weight_dtype="bf16" \
    --push_to_hub \
    --enable_model_cpu_offload \ # will cause slower training
    --use_adafactor \  # save 10g memory
add model cpu offload
b533cae5
Merge branch 'flux-controlnet-train' of https://github.com/PromeAIpro…
be965f0e
PromeAIpro Merge branch 'main' into flux-controlnet-train
a2daa9f2
update help for adafactor
4d7c1afb
PromeAIpro PromeAIpro requested a review from sayakpaul sayakpaul 237 days ago
Mason-McGough
Mason-McGough236 days ago

@ShunyuYao I would try to precompute the text embeddings (and maybe the VAE outputs too) if possible. Those will save you a few gigabytes.

yiyixuxu
yiyixuxu commented on 2024-09-23
src/diffusers/pipelines/flux/pipeline_flux_controlnet.py
860860 joint_attention_kwargs=self.joint_attention_kwargs,
861861 return_dict=False,
862862 )
863
# ensure dtype
yiyixuxu236 days ago

why is this needed?

PromeAIpro236 days ago (edited 236 days ago)

see discuss #9324 (comment)
and https://github.com/huggingface/diffusers/pull/9324/files/32eb1ef4897332954f3f0e967ff165e09e341ed8#r1758447457

we think rather writing convert code in train script, it better to writing them in pipeline. (now is writen both in training script and pipeline). It is just an ensure, and brings no effect in inference.

yiyixuxu235 days ago

I looked at the comment, it is still not explained why it is needed
we have no issue running inference with the available controlnet checkpoint without this change.

PromeAIpro235 days ago

it is right, the change of dtype convertion in pipeline was not relating much with controlnet training script, we found the dtype inconsistency issue when writing training script, it doesn't happen during inference now, but we fix that, it was a by-the-way. Maybe adapt this fix in a new issue when dtype inconsistency happens in future ?
we hold neutral position towards that, how do you think? @yiyixuxu @sayakpaul

yiyixuxu235 days ago

yes, a separate issue would be nice! and maybe a minimum reproducible script to help understand the issue

PromeAIpro235 days ago (edited 235 days ago)

This is because t5 does not support autocast (causing black images). However, during validation, our controlnet is fp32 and our transformer is bf16, so we need to explicitly convert the dtype in the pipeline.

PromeAIpro235 days ago

start with an new issue #9527

yiyixuxu234 days ago

but validation is to log outputs - why cannot we run controlnet in bf16 too? anyways I think this change should not be in pipelines for now:)

PromeAIpro234 days ago

yes, Is there a way for diffusers to clone controlnet? we consider cloning a copy and converting it to bf16 for validation, If we directly convert the original weights, we will lose precision.

PromeAIpro234 days ago

The fundamental solution is to support the autocast problem of t5 here(#9527)

sayakpaul234 days ago👍 1

Okay something that would work is the following:

  1. We compute all the text embeddings beforehand in the validation loop and then delete the text encoders.
  2. We proceed with our regular validation with the precomputed text embeddings.

Would this work?

PromeAIpro234 days ago

works, now the change of pipeline is removed

christopher-beckham
christopher-beckham236 days ago👍 1

Hi,

Just shamelessly plugging my ControlNet repo here which I just made public: https://github.com/christopher-beckham/flux-controlnet

Feel free to pick and choose things from the code if you think it could help with your PR. I have explained some of it in the README. While there is no public dataset associated with this repo I have trained with qint8 quantisation + 8-bit ADAM on a fairly large internal dataset and gotten more or less decent images on a 40GB GPU.

Some of the tricks mentioned here may also be of use: https://github.com/bghira/SimpleTuner/blob/main/documentation/quickstart/FLUX.md

sayakpaul
sayakpaul236 days ago👍 1

https://github.com/christopher-beckham/flux-controlnet

@christopher-beckham thanks for sharing your work! Looks very cool!

The purpose of the scripts within examples (at least the ones we officially maintain at the moment) is to provide barebones. So, I think it's okay for the moment to skip with quantization related bits and other things.

The simplest reasonable defaults that lead to okay results are fine, IMO. So, what we could do is provide mentions to the other popular ControlNet trainers like yours from the README in case users want to take things further. I hope that works.

sayakpaul Merge branch 'main' into flux-controlnet-train
a11219ce
sayakpaul
sayakpaul236 days ago (edited 236 days ago)

Just reviewed it!

I think it looks quite good, apart from @yiyixuxu's concerns here: #9324 (comment).

I would probably lean towards doing it from the training script because otherwise, it would add more maintenance. But I will let Yiyi comment further.

@PromeAIpro could you please follow the instructions from the CI and ensure the core quality checks pass?

make quality & style
49a14920
PromeAIpro
PromeAIpro236 days ago (edited 236 days ago)

Just reviewed it!

I think it looks quite good, apart from @yiyixuxu's concerns here: #9324 (comment).

I would probably lean towards doing it from the training script because otherwise, it would add more maintenance. But I will let Yiyi comment further.

@PromeAIpro could you please follow the instructions from the CI and ensure the core quality checks pass?

yes, i know it, This is because t5 does not support autocast (causing black images). However, during validation, our controlnet is fp32 and our transformer is bf16, so we need to explicitly convert the dtype in the pipeline.

@sayakpaul already run make style and make quality,Is there anything else that needs to be done?

PromeAIpro
PromeAIpro236 days ago

@sayakpaul not quite sure about this.
image

sayakpaul
sayakpaul236 days ago👍 1

Can you try the following?

  1. Create a new Python environment.
  2. Activate it.
  3. Go to your local clone of diffusers and run `pip install -e ".[quality]"
  4. And then run make style && make quality?
sayakpaul Merge branch 'main' into flux-controlnet-train
6169b619
make quality and style
d895b8ff
Merge branch 'flux-controlnet-train' of https://github.com/PromeAIpro…
395d2f7b
PromeAIpro
PromeAIpro236 days ago

@sayakpaul works! try to check it again

rename flux_controlnet_model_name_or_path
b6a90211
ShunyuYao
ShunyuYao235 days ago (edited 235 days ago)

@ShunyuYao try to use --use_adafactor as a Optimizer maybe?also by using the latest code, you can use --enable_model_cpu_offload to run it in 1024res with AdamW.

Here are my setting(cause about 66g for training).Please delete the # comment when you use:

CUDA_VISIBLE_DEVICES=0 python ../train_controlnet_flux.py \
    --pretrained_model_name_or_path=$MODEL_DIR \
    --dataset_name=fusing/fill50k \
    --max_train_samples=100 \
    --conditioning_image_column=conditioning_image \
    --image_column=image \
    --caption_column=text \
    --output_dir=$OUTPUT_DIR \
    --mixed_precision="bf16" \
    --resolution=1024 \
    --learning_rate=1e-5 \
    --max_train_steps=10 \
    --checkpointing_steps=11 \
    --validation_steps=1 \
    --validation_image "./conditioning_image_1.png" \
    --validation_prompt "red circle with blue background" \
    --num_validation_images=1 \
    --train_batch_size=1 \
    --gradient_accumulation_steps=2 \
    --report_to="wandb" \
    --num_double_layers=4 \
    --num_single_layers=0 \
    --seed=42 \
    --save_weight_dtype="bf16" \
    --push_to_hub \
    --enable_model_cpu_offload \ # will cause slower training
    --use_adafactor \  # save 10g memory

@PromeAIpro Thanks for your advice, I tried different settings. Finally find that the --gradient_checkpointing is quite useful to save some memory for reso 1024

PromeAIpro Merge branch 'main' into flux-controlnet-train
66dfdbeb
fix back src/diffusers/pipelines/flux/pipeline_flux_controlnet.py
b097d0d6
vahidEttehadiAniml
vahidEttehadiAniml234 days ago

I am trying to run it on a multi-gpu machine. not working!

image

PromeAIpro
PromeAIpro234 days ago (edited 234 days ago)

@vahidEttehadiAniml sry, for multi gpu, I haven't tested it much, but you can definitely follow the process I give you in the readme and train in 40g a100 with deepspeed and accelerate.

fix dtype error by pre calculate text emb
49787e30
PromeAIpro Merge branch 'main' into flux-controlnet-train
eb645575
PromeAIpro PromeAIpro requested a review from yiyixuxu yiyixuxu 234 days ago
rm image save
e9d3e049
sayakpaul
sayakpaul approved these changes on 2024-09-26
sayakpaul233 days ago

Thank you so much for your hard work!

sayakpaul Merge branch 'main' into flux-controlnet-train
7245c75f
quality fix
25fc313e
Merge branch 'flux-controlnet-train' of https://github.com/PromeAIpro…
c2b44d34
PromeAIpro PromeAIpro requested a review from sayakpaul sayakpaul 233 days ago
kadirnar
kadirnar233 days ago

👀

sayakpaul Merge branch 'main' into flux-controlnet-train
bc2ea9eb
linjiapro
linjiapro233 days ago (edited 233 days ago)

Hi,

Just shamelessly plugging my ControlNet repo here which I just made public: https://github.com/christopher-beckham/flux-controlnet

Feel free to pick and choose things from the code if you think it could help with your PR. I have explained some of it in the README. While there is no public dataset associated with this repo I have trained with qint8 quantisation + 8-bit ADAM on a fairly large internal dataset and gotten more or less decent images on a 40GB GPU.

Some of the tricks mentioned here may also be of use: https://github.com/bghira/SimpleTuner/blob/main/documentation/quickstart/FLUX.md

I think one key factor to reduce the GPU load is the following:

--quantize: quantise everything (except ControlNet) into int8 via the optimum-quanto library. This is weight only quantisation, so params are stored in int8 and are de-quantised on the fly. You may be able to squeeze out even more savings with lower bits but this has not been tested.

Can we add the above option to this PR?
cc @PromeAIpro

christopher-beckham
christopher-beckham233 days ago (edited 233 days ago)

@linjiapro: @sayakpaul envisioned the script as being more on the barebones side (see his above reply), though I would also argue that most people are not going to have access to (or not going to want to pay for) an 80gb GPU. Therefore, I would really argue the addition of quantisation.

Edit: edited my original post, I originally mentioned 8-bit ADAM would be nice to have but I see that it's in :)

Edit no2: from the above discussion it looks like the controlnet is being trained in fp32 however, it would be trivial to add an option to also train it in bf16 and I had no issues with it. And maybe you'd avoid the autocast issue altogether for the validation logging.

linjiapro
linjiapro233 days ago

Oh missed sayakpaul's comments. I was hoping that quantise does not change too much of the script. It is just a data format for the nets. I thought it is a flag, if you turn on, the nets are casted to int8. But it seems it is not that simple.

christopher-beckham
christopher-beckham233 days ago (edited 233 days ago)👍 1

It is that simple, as that's what optimum-quanto is designed to do. As to what extent one loses out on sample quality during training, I'm not sure (one just quantises the entire backbone, you can keep the controlnet in bf16). But in my own personal experience using it (with my repo) I never encountered any numerical instabilities and sample quality was on par with what I expected from other controlnets.

sayakpaul
sayakpaul233 days ago

Edit no2: from the above discussion it looks like the controlnet is being trained in fp32 however, it would be trivial to add an option to also train it in bf16 and I had no issues with it. And maybe you'd avoid the autocast issue altogether for the validation logging.

@christopher-beckham thank you! WDYT about a follow-up PR to:

  • Enable training and saving in BF16
  • Add your repository in the README so that people can explore other ways

Would that work for you?

sayakpaul
sayakpaul233 days ago👍 1

@PromeAIpro the test is failing:
https://github.com/huggingface/diffusers/actions/runs/11056617305/job/30718614922?pr=9324#step:9:356

We need to use a small checkpoint like:

pretrained_model_name_or_path = "hf-internal-testing/tiny-flux-pipe"

fix test
2ee67c47
Merge branch 'flux-controlnet-train' of https://github.com/PromeAIpro…
7ab1b808
sayakpaul Merge branch 'main' into flux-controlnet-train
56cd9840
sayakpaul
sayakpaul233 days ago

Ah I see what is happening. First, we are using "https://github.com/huggingface/diffusers/actions/runs/11063243172/job/30739077215?pr=9324#step:9:268", which is a big model for a CI. Can we please follow what the rest of the ControlNet test follows i.e.,

  1. Use a small and tiny base model.
  2. Initialize ControlNet from the transformer?
PromeAIpro
PromeAIpro233 days ago (edited 233 days ago)

image
looks like a tokenizer_two error?

sayakpaul
sayakpaul233 days ago

Regarding of the tokenizer, we still need to address the usage of small checkpoints.

BTW, how can I call this functiontest_controlnet_flux?

pytest examples/controlnet -k "test_controlnet_flux"
sayakpaul
sayakpaul233 days ago

But you're using "--controlnet_model_name_or_path=promeai/FLUX.1-controlnet-lineart-promeai" in the test.

We don't use a pre-trained ControlNet model in the tests. We initialize it from the denoiser. For SD and SDXL, we initialize it from the UNet. We need to do something similar here.

PromeAIpro
PromeAIpro233 days ago

try using

  flux_controlnet = FluxControlNetModel.from_transformer(
        flux_transformer,
        num_layers=args.num_double_layers,
        num_single_layers=args.num_single_layers,
    )

but got error
image
BTW, the tokenizer loaded problem fixed by

    tokenizer_two = AutoTokenizer.from_pretrained(
        args.pretrained_model_name_or_path,
        subfolder="tokenizer_2",
        revision=args.revision,
-      use_fast=False,
    )

PromeAIpro
PromeAIpro233 days ago

thought i loaded tiny-flux-pipe correctlly, maybe a problem caused by controlnet. from_transformer?
image

sayakpaul
sayakpaul233 days ago

Thanks for fixing the issue on tokenizer. Regarding initializing from the transformer, I think we're using because we're using :

--num_double_layers=4
--num_single_layers=0

Could we try:

--num_double_layers=2
--num_single_layers=1
PromeAIpro
PromeAIpro233 days ago

i just using --num_double_layers=1. --num_single_layers=0
i see the problem, config file seems to be loaded incorrectly
image

PromeAIpro
PromeAIpro233 days ago

Why do we need to update the parameter here? Shouldn't it be passed in by the transformer?
image

PromeAIpro
PromeAIpro233 days ago (edited 233 days ago)

I explicitly pass it in, and works

flux_controlnet = FluxControlNetModel.from_transformer(
            flux_transformer,
+            attention_head_dim=flux_transformer.config["attention_head_dim"],
+            num_attention_heads=flux_transformer.config["num_attention_heads"],
            num_layers=args.num_double_layers,
            num_single_layers=args.num_single_layers,
        )
sayakpaul
sayakpaul233 days ago

I can replicate the error:

from diffusers import FluxTransformer2DModel, FluxControlNetModel

transformer = FluxTransformer2DModel.from_pretrained(
    "hf-internal-testing/tiny-flux-pipe", subfolder="transformer"
)
controlnet = FluxControlNetModel.from_transformer(
    transformer=transformer, num_layers=1, num_single_layers=1, attention_head_dim=16, num_attention_heads=1
)

Leads to:

RuntimeError: Error(s) in loading state_dict for CombinedTimestepTextProjEmbeddings:
        size mismatch for timestep_embedder.linear_1.weight: copying a param with shape torch.Size([32, 256]) from checkpoint, the shape in current model is torch.Size([16, 256]).
        size mismatch for timestep_embedder.linear_1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for timestep_embedder.linear_2.weight: copying a param with shape torch.Size([32, 32]) from checkpoint, the shape in current model is torch.Size([16, 16]).
        size mismatch for timestep_embedder.linear_2.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for text_embedder.linear_1.weight: copying a param with shape torch.Size([32, 32]) from checkpoint, the shape in current model is torch.Size([16, 32]).
        size mismatch for text_embedder.linear_1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
        size mismatch for text_embedder.linear_2.weight: copying a param with shape torch.Size([32, 32]) from checkpoint, the shape in current model is torch.Size([16, 16]).
        size mismatch for text_embedder.linear_2.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).

Opened an issue here: #9540.

sayakpaul
sayakpaul233 days ago (edited 233 days ago)🚀 1

@PromeAIpro could you make the changes accordingly then?

fix tiny flux train error
7cedfb1f
Merge branch 'flux-controlnet-train' of https://github.com/PromeAIpro…
ee6ca900
PromeAIpro
PromeAIpro233 days ago (edited 233 days ago)

I have tested it on my own machine and it works correctly.
BTW, added guidance handle for some flux_transformer that dont use guidance such as tiny-flux-pipe

PromeAIpro
PromeAIpro233 days ago

what about this
image

sayakpaul
sayakpaul commented on 2024-09-27
Conversation is marked as resolved
Show resolved
examples/controlnet/train_controlnet_flux.py
451 parser.add_argument(
452 "--report_to",
453 type=str,
454
default="wandb",
sayakpaul233 days ago

We shouldn't make it default. See other scripts, as an example:

change report to to tensorboard
dcac1b00
PromeAIpro PromeAIpro requested a review from sayakpaul sayakpaul 233 days ago
fix save name error when test
89a1f353
Fix shrinking errors
6ccd3e46
PromeAIpro
PromeAIpro233 days ago (edited 233 days ago)

looks good try it again!

$ pytest examples/controlnet -k "test_controlnet_flux"
===================================================================== test session starts ======================================================================
platform linux -- Python 3.10.14, pytest-8.3.3, pluggy-1.5.0
rootdir: /data3/home/srchen/test_diffusers/diffusers
configfile: pyproject.toml
collected 5 items / 4 deselected / 1 selected                                                                                                                  

examples/controlnet/test_controlnet.py .                                                                                                                 [100%]

=============================================================== 1 passed, 4 deselected in 25.87s ===============================================================
sayakpaul sayakpaul merged 534848c3 into main 233 days ago
sayakpaul
sayakpaul233 days ago❤ 1

Thanks a lot for your contributions!

PromeAIpro
PromeAIpro233 days ago

Thank you for your guidance in my work!!

ScilenceForest
ScilenceForest219 days ago

Edit no2: from the above discussion it looks like the controlnet is being trained in fp32 however, it would be trivial to add an option to also train it in bf16 and I had no issues with it. And maybe you'd avoid the autocast issue altogether for the validation logging.

@christopher-beckham thank you! WDYT about a follow-up PR to:

  • Enable training and saving in BF16
  • Add your repository in the README so that people can explore other ways

Would that work for you?

Thank you guys for your work! @sayakpaul Does this reply indicate that BF16 is not currently supported, but I saw in a slightly earlier comment that the example parameters provided by @PromeAIpro included --mixed_precision="bf16"\ and --save_weight_dtype="bf16", what do they mean?Also, I understand that your design idea is to provide only simple and effective basic functionality, but I also found in sdxl's controlnet training scripts that there are some optimisation options such as --gradient_checkpointing --use_8bit_adam --set_grads_to_none --enable_xformers_memory_efficient_attention etc., so will similar performance optimisation options appear in this script subsequently?Thank you very much for your answers!

bc129697
bc129697207 days ago

Here are some training results by lineart controlnet.

input output prompt
ComfyUI_temp_egnkb_00001_ ComfyUI_00027_ cute anime girl with massive fluffy fennec ears and a big fluffy tail blonde messy long hair blue eyes wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere
ComfyUI_temp_znagh_00001_ ComfyUI_temp_cufps_00002_ a busy urban intersection during daytime. The sky is partly cloudy with a mix of blue and white clouds. There are multiple traffic lights, and vehicles are seen waiting at the red signals. Several businesses and shops are visible on the side, with signboards and advertits. The road is wide, and there are pedestrian crossings. Overall, it appears to be a typical day in a bustling city.
First train on 512res and then fine-tune with 1024res

Hello, where can I find the dataset for training controlnet?Thanks

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone