[Core] fuse_qkv_projection() to Flux #9185

sayakpaul merged 11 commits into main from fuse-flux

sayakpaul292 days ago (edited 289 days ago)

What does this PR do?

Adds fuse_qkv_projection() support Flux.

Will report the performance improvements soon.

Batch size 1 (see footnote):

With fusion: 8.456 seconds (memory 25.25 GB)
Without fusion: 11.492 seconds (35.455 GB)

As a reminder, refer to this comment to understand the scope of when fusion is ideal.

Footnote:

This was run on an A100. For quantization, we use "autoquant" from torchao. We are working on a repository to show the full-blown recipes. It will be made open in a day's time.

start fusing flux.

54c97976

test

b0544a19

finish fusion

709690ad

Merge branch 'main' into fuse-flux

b28f9491

HuggingFaceDocBuilderDev292 days ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul requested a review from

DN6 291 days ago

sayakpaul marked this pull request as ready for review 291 days ago

Merge branch 'main' into fuse-flux

0f1b17a0

sayakpaul requested a review from

yiyixuxu 289 days ago

Merge branch 'main' into fuse-flux

24e34556

yiyixuxu288 days ago

awesome, but I think we will have to update once the refactor PR is in since I combined the attention processors there #9074

sayakpaul287 days ago

100 percent right. I will repurpose once your PR is in :)

Merge branch 'main' into fuse-flux

e3bb3f51

Merge branch 'main' into fuse-flux

c0ec9f3a

fix

fbf0e71f

fix-copues

2bdfdde1

sayakpaul285 days ago

@yiyixuxu could you give this a look? I face adjusted it accordingly with #9074.

resolve conflicts.

9d5bec7d

yiyixuxu approved these changes on 2024-08-23

yiyixuxu284 days ago

PR looks good to me
Can we run a actual test to see the improvement before merge? feel free to merge once that's done

sayakpaul284 days ago

Check the PR description:

Batch size 1 (see footnote):

With fusion: 8.456 seconds (memory 25.25 GB)
Without fusion: 11.492 seconds (35.455 GB)

As a reminder, refer to https://github.com/huggingface/diffusers/pull/8829/#issuecomment-2236254834 to understand the scope of when fusion is ideal.

Footnote:

This was run on an A100. For quantization, we use "autoquant" from [torchao](https://github.com/pytorch/ao/). We are working on a repository to show the full-blown recipes. It will be made open in a day's time.

yiyixuxu284 days ago

@sayakpaul ahh I missed it! sorry! very nice!

sayakpaul merged 2d9ccf39 into main 284 days ago

sayakpaul deleted the fuse-flux branch 284 days ago

ngaloppo231 days ago (edited 231 days ago)

@sayakpaul This feature doesn't seem to work together with torchao's quantize_(transformer, int8_weight_only()) quantization. Is that expected? I get an error from torchao:

File "/Users/sysperf/miniforge3/envs/flux/lib/python3.11/site-packages/torchao/utils.py", line 389, in _dispatch__torch_dispatch__
    raise NotImplementedError(f"{cls.__name__} dispatch: attempting to run unimplemented operator/function: {func}")
NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: aten.cat.default

sayakpaul231 days ago

Please redirect the issue to https://github.com/sayakpaul/diffusers-torchao

Reviewers

yiyixuxu

DN6

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

diffusers [Core] fuse_qkv_projection() to Flux #9185 Merged

[Core] fuse_qkv_projection() to Flux #9185

What does this PR do?

diffusers
[Core] fuse_qkv_projection() to Flux
#9185

Merged