The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
awesome, but I think we will have to update once the refactor PR is in since I combined the attention processors there #9074
100 percent right. I will repurpose once your PR is in :)
PR looks good to me
Can we run a actual test to see the improvement before merge? feel free to merge once that's done
Check the PR description:
Batch size 1 (see footnote):
With fusion: 8.456 seconds (memory 25.25 GB)
Without fusion: 11.492 seconds (35.455 GB)
As a reminder, refer to https://github.com/huggingface/diffusers/pull/8829/#issuecomment-2236254834 to understand the scope of when fusion is ideal.
Footnote:
This was run on an A100. For quantization, we use "autoquant" from [torchao](https://github.com/pytorch/ao/). We are working on a repository to show the full-blown recipes. It will be made open in a day's time.
@sayakpaul ahh I missed it! sorry! very nice!
@sayakpaul This feature doesn't seem to work together with torchao
's quantize_(transformer, int8_weight_only())
quantization. Is that expected? I get an error from torchao
:
File "/Users/sysperf/miniforge3/envs/flux/lib/python3.11/site-packages/torchao/utils.py", line 389, in _dispatch__torch_dispatch__
raise NotImplementedError(f"{cls.__name__} dispatch: attempting to run unimplemented operator/function: {func}")
NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: aten.cat.default
Please redirect the issue to https://github.com/sayakpaul/diffusers-torchao
Login to write a write a comment.
What does this PR do?
Adds
fuse_qkv_projection()
support Flux.Will report the performance improvements soon.
Batch size 1 (see footnote):
With fusion: 8.456 seconds (memory 25.25 GB)
Without fusion: 11.492 seconds (35.455 GB)
As a reminder, refer to this comment to understand the scope of when fusion is ideal.
Footnote:
This was run on an A100. For quantization, we use "autoquant" from
torchao
. We are working on a repository to show the full-blown recipes. It will be made open in a day's time.