diffusers
[Core] fuse_qkv_projection() to Flux
#9185
Merged

[Core] fuse_qkv_projection() to Flux #9185

sayakpaul merged 11 commits into main from fuse-flux
sayakpaul
sayakpaul292 days ago (edited 289 days ago)

What does this PR do?

Adds fuse_qkv_projection() support Flux.

Will report the performance improvements soon.

Batch size 1 (see footnote):

With fusion: 8.456 seconds (memory 25.25 GB)
Without fusion: 11.492 seconds (35.455 GB)

As a reminder, refer to this comment to understand the scope of when fusion is ideal.

Footnote:

This was run on an A100. For quantization, we use "autoquant" from torchao. We are working on a repository to show the full-blown recipes. It will be made open in a day's time.

sayakpaul start fusing flux.
54c97976
sayakpaul test
b0544a19
sayakpaul finish fusion
709690ad
sayakpaul Merge branch 'main' into fuse-flux
b28f9491
HuggingFaceDocBuilderDev
HuggingFaceDocBuilderDev292 days ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul sayakpaul requested a review from DN6 DN6 291 days ago
sayakpaul sayakpaul marked this pull request as ready for review 291 days ago
sayakpaul Merge branch 'main' into fuse-flux
0f1b17a0
sayakpaul sayakpaul requested a review from yiyixuxu yiyixuxu 289 days ago
sayakpaul Merge branch 'main' into fuse-flux
24e34556
yiyixuxu
yiyixuxu288 days ago

awesome, but I think we will have to update once the refactor PR is in since I combined the attention processors there #9074

sayakpaul
sayakpaul287 days ago

100 percent right. I will repurpose once your PR is in :)

sayakpaul Merge branch 'main' into fuse-flux
e3bb3f51
sayakpaul Merge branch 'main' into fuse-flux
c0ec9f3a
sayakpaul fix
fbf0e71f
sayakpaul fix-copues
2bdfdde1
sayakpaul
sayakpaul285 days ago

@yiyixuxu could you give this a look? I face adjusted it accordingly with #9074.

sayakpaul resolve conflicts.
9d5bec7d
yiyixuxu
yiyixuxu approved these changes on 2024-08-23
yiyixuxu284 days ago

PR looks good to me
Can we run a actual test to see the improvement before merge? feel free to merge once that's done

sayakpaul
sayakpaul284 days ago

Check the PR description:

Batch size 1 (see footnote):

With fusion: 8.456 seconds (memory 25.25 GB)
Without fusion: 11.492 seconds (35.455 GB)

As a reminder, refer to https://github.com/huggingface/diffusers/pull/8829/#issuecomment-2236254834 to understand the scope of when fusion is ideal.

Footnote:

This was run on an A100. For quantization, we use "autoquant" from [torchao](https://github.com/pytorch/ao/). We are working on a repository to show the full-blown recipes. It will be made open in a day's time.
yiyixuxu
yiyixuxu284 days ago

@sayakpaul ahh I missed it! sorry! very nice!

sayakpaul sayakpaul merged 2d9ccf39 into main 284 days ago
sayakpaul sayakpaul deleted the fuse-flux branch 284 days ago
ngaloppo
ngaloppo231 days ago (edited 231 days ago)

@sayakpaul This feature doesn't seem to work together with torchao's quantize_(transformer, int8_weight_only()) quantization. Is that expected? I get an error from torchao:

File "/Users/sysperf/miniforge3/envs/flux/lib/python3.11/site-packages/torchao/utils.py", line 389, in _dispatch__torch_dispatch__
    raise NotImplementedError(f"{cls.__name__} dispatch: attempting to run unimplemented operator/function: {func}")
NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: aten.cat.default
sayakpaul
sayakpaul231 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone