Stable Diffusion 3.x and Flux Optimization #22986
initial
6fb73696
sd3.x and flux
9b2dcc0d
tianleiwu
marked this pull request as draft 1 year ago
update FastGelu and RMSNorm fusions
7f925cef
support Reciprocal in RMSNorm fusion
cf259e1b
match_child_path interface change
b38f12eb
clean up
a58b68cd
MHA fusion for MMDit
c7317cbd
cuda layernorm support broadcast
2f5b9b9c
force fuse layernorm
699a64cf
refactoring
c1d01600
ACinfr
commented
on 2024-12-16
mha fusion for flux
1b9ea543
remove transpose for query
5528276b
t5 optimization and mixed precision conversion
89950d13
fix node name
c8691511
Add option to use bfloat16
84b1a515
fix attention
b7041d1e
update node block list of t5 encoder
455a3ea9
benchmark torch eager mode
dad0ac40
update comment
84005580
benchmark torch compile
9e43e206
refine benchmark_flux.sh
4bf9f252
Merge branch 'main' into tlwu/sd3_optimum
913c6eda
undo layer norm kernel
a47b6af5
CMAKE_CUDA_ARCHITECTURES=native
55178d67
Merge branch 'main' into tlwu/sd3_optimum
dac8ea7d
add tests
ebade480
tianleiwu
changed the title [WIP] Stable Diffusion 3.x and Flux Optimization Stable Diffusion 3.x and Flux Optimization 1 year ago
tianleiwu
marked this pull request as ready for review 1 year ago
update tests
fd227bb3
undo some change (move to another PR)
87bd3ecc
tianleiwu
merged
6550f4b3
into main 1 year ago
tianleiwu
deleted the tlwu/sd3_optimum branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub