diffusers
[Feat] add I2VGenXL for image-to-video generation
#6665
Merged

[Feat] add I2VGenXL for image-to-video generation #6665

yiyixuxu merged 184 commits into main from convert-i2vgen-xl
sayakpaul
sayakpaul let's see
bb7c4121
sayakpaul better conditioning for class_embed_type
d537b6c5
sayakpaul determine in_channels programatically.
15f16071
sayakpaul worse condition
a329b73e
sayakpaul fix: sample_size.
5660ba1e
sayakpaul Merge branch 'main' into convert-i2vgen-xl
eb8ea72b
sayakpaul separte script for i2vgen
011329d2
sayakpaul changes
3e0015d0
sayakpaul fix: basic transformer block init.
f09c2dd7
sayakpaul check
7dd0cb02
sayakpaul revert block_out_channels.
d6f1e6d7
sayakpaul debug info
da5b83c0
sayakpaul debug info
0ecef35c
sayakpaul debug info
13ecc11a
sayakpaul debug
6778b3f8
sayakpaul correct ffn inner dim
20aeaf3b
sayakpaul debug info
ef85c84a
sayakpaul input channels should be 8./
7d031624
sayakpaul input channels corrected
a7366940
sayakpaul Revert "input channels corrected"
34e7349b
sayakpaul better input channels
896d626a
sayakpaul Revert "better input channels"
02b76b5e
sayakpaul rectify conversion script
15a6fbde
sayakpaul conversion script.
5a097226
sayakpaul conversio
bcccfdfd
sayakpaul push_to_hub
1c68e056
sayakpaul remove print
3b5940b8
sayakpaul let's see.
aaae0320
sayakpaul safeguard .
1c72370d
sayakpaul device place,ent
25527f86
sayakpaul comment to remind that writing good code is important
c38ef7af
sayakpaul device placement.
4f4d4e6a
sayakpaul corrct layernorm condition.
e717630f
sayakpaul norm3 condition
292668ee
sayakpaul correct norm3
17a2418e
sayakpaul incorporate einops
d5b76930
sayakpaul image_embeddings
35b15f28
sayakpaul okay
693b2cee
sayakpaul dtype debug
642cbe4b
sayakpaul dtype fix
105ecc55
sayakpaul dtype fix.
d7e6b2cd
sayakpaul simplify code.
5de43480
sayakpaul remove print
2852de16
sayakpaul debug
76772c5b
sayakpaul debug
600ffd85
sayakpaul debug
ecf0070c
sayakpaul debug
b88d9a9f
sayakpaul debu
87eff5ed
sayakpaul debug
3178e742
sayakpaul remove print
32f6151d
sayakpaul add: dummy pipeline implementation too.
87e70abc
sayakpaul pipeline draft
5e7f17ff
sayakpaul complete conversion script.
28b9d57f
sayakpaul add new unet to modules
7943c91f
sayakpaul enable chunked decoding on vae.
7f3d5593
sayakpaul correct image latent behaviour
26d87c29
sayakpaul remove comment
5d03574f
sayakpaul correct dtyp
989c707a
sayakpaul correct output type.
7b88ad37
sayakpaul Merge branch 'main' into convert-i2vgen-xl
eec8791c
sayakpaul init fix
6ff96068
sayakpaul fix-copies
b44e0532
HuggingFaceDocBuilderDev
sayakpaul Merge branch 'main' into convert-i2vgen-xl
51fdf304
sayakpaul chunked decoding should be optional
9bd5f16c
sayakpaul what happens if we take mode instead?
cc7e9754
sayakpaul fix: type
734274ad
sayakpaul back to sampling and clean up tensorification
b48f0945
sayakpaul better variable name
761c08ec
sayakpaul try to follow the original implementation closely.
a0c00c02
sayakpaul proper repeatation
c6d35e2d
sayakpaul fix: fps condition check
0ebad2e9
sayakpaul fix: masking
da309d5b
sayakpaul fix: masking
5b0b5dfc
sayakpaul go back to negative_image_image_latents.
ef4dd348
sayakpaul make type casting for fps explicit
670488e1
sayakpaul original implementation image_latents.
80a6f1a7
sayakpaul Revert "original implementation image_latents."
85d364c2
sayakpaul sinusoidal embedding?
b0865dd6
sayakpaul simple bilinear resizing.
87742e92
sayakpaul remove the sinusoidal implementation from i2vgenxl
89024408
sayakpaul resolve conflicts
e9cd8397
sayakpaul harmonize with main
585a6b6e
sayakpaul fix: tensor2vid
90d91a8c
sayakpaul fix: tensor2vid
4a7d4aee
sayakpaul fix: tensor2vid
ab9569f8
sayakpaul fix: doc
58844fe1
sayakpaul fix model offload sequence.
11fd6469
sayakpaul
yiyixuxu yiyixuxu added video
DN6 update
6778e6b9
DN6 update
9f737924
DN6 add docs
0ecd79b8
DN6 update
eefa6ccb
DN6 update
a9fecb33
DN6 update
27817919
DN6 update
0d1ea8c4
DN6 update
1d3846d4
DN6 update
db0213a1
sayakpaul improve docs.
f2964ba8
sayakpaul docstring to the pipeline
2d5071e5
sayakpaul licensing in the pipeline scripts.
4cd00836
sayakpaul clean up the docstring of the UNet.
6012362d
sayakpaul Merge branch 'main' into convert-i2vgen-xl
09519a1e
sayakpaul
sayakpaul commented on 2024-01-30
sayakpaul make _resize_bilinear and _center_crop_wide accept torch tensors as w…
23935a97
sayakpaul data type fix
57b20ee1
sayakpaul unint8 > uint8
4d51fe8a
sayakpaul channels_last
14404b2a
sayakpaul debug
24d813e5
sayakpaul fix download path for the example image
bf1eb40f
sayakpaul fix: download path again
f35f3d8a
sayakpaul use cross_attention_dim to initialize
28804420
sayakpaul debug
698f9c11
sayakpaul debu
68cbe593
sayakpaul reduce hidden size of the vision encoder
45c682ec
sayakpaul go
3a701a21
sayakpaul debug more
0a4c6866
sayakpaul reduce more hidden dim
758acc0f
sayakpaul remove callback and callback_steps from required params check
0bfd0427
sayakpaul remove print
dd5a8f04
sayakpaul assertions for the default case..
50d46062
sayakpaul skip test_attention_slicing_forward_pass as it's depcrecated.
2a4c7272
sayakpaul feature_extractor.
66034c52
sayakpaul feature_extractor.
b1819cda
sayakpaul relax precision
48e7694d
sayakpaul relax more.
0f230f9c
sayakpaul torch.manual_seed(0)
836fb678
sayakpaul relax precision
a5cb5b1a
sayakpaul uncomment batching tests
947e63ad
sayakpaul debug
216b9ddc
sayakpaul debug more
7c810525
sayakpaul debug more
31409afd
sayakpaul make the pt to pil utilities better
2faaffed
sayakpaul debug
a29c2011
sayakpaul format string
4adc8519
sayakpaul okay
9cb0b846
sayakpaul force_feature_extractor_resize
0b9a9ef7
sayakpaul debug
7cffe74a
sayakpaul expand to samples's shape
3d0ef8b4
sayakpaul check
ca422ef8
sayakpaul fix: batching behaviour for fps
cfafe51e
sayakpaul test_inference_batch_single_identical
d85bd2d2
sayakpaul relax test_inference_batch_single_identical
73242917
sayakpaul relax a bit more.
bb103025
sayakpaul test_num_videos_per_prompt
5e79f3d1
sayakpaul let's go.
f3c58a24
sayakpaul remove extra prints.
7cb384c8
sayakpaul remove force_feature_extractor_resize
f64c3d25
sayakpaul remove force_feature_extractor_resize
1675b075
sayakpaul fix: test_num_videos_per_prompt
8c20445d
sayakpaul fix: test_num_videos_per_prompt
2221bbc8
sayakpaul fix: test_num_videos_per_prompt
65849883
sayakpaul fix a bit more
7fad5851
sayakpaul style
61091789
sayakpaul add: slow test
7b281b53
sayakpaul flattened image slice
24cfc1d9
sayakpaul variant
edd6cc5e
sayakpaul assertion
6ebb2653
sayakpaul add: note about memory optimization
a03649fa
sayakpaul sayakpaul marked this pull request as ready for review 2 years ago
sayakpaul sayakpaul requested a review from DN6 DN6 2 years ago
sayakpaul sayakpaul requested a review from patrickvonplaten patrickvonplaten 2 years ago
sayakpaul sayakpaul requested a review from yiyixuxu yiyixuxu 2 years ago
sayakpaul being to cpu before calling numpy()
c78617ee
sayakpaul finish slow test fixes
76a42b38
sayakpaul Merge branch 'main' into convert-i2vgen-xl
ca4a977a
sayakpaul Empty-Commit
182447af
sayakpaul pin peft dependencies.
a54facc0
sayakpaul
sayakpaul commented on 2024-01-30
sayakpaul
patrickvonplaten
patrickvonplaten commented on 2024-01-30
sayakpaul
sayakpaul commented on 2024-01-30
sayakpaul remove attention slicing and unload_lora
72e466e2
sayakpaul remove attention mask
5ba9b4ad
sayakpaul timsteps.
693d8270
sayakpaul add missing entries in the unet docstring
88f03a39
sayakpaul Apply suggestions from code review
9bf706e4
sayakpaul remove textual inversion
a682dca0
sayakpaul remove _to_tensor on fps.
a9c23e8b
sayakpaul leverage VaeImageProcessor.
ec5694ad
DN6
sayakpaul remove unnecessary config vars.
e6c07b56
sayakpaul use num_attention_heads
8c980df1
sayakpaul clean up conv_out layer creation
5cbaf2e7
sayakpaul refactor attention logic for cleaning up norm handling
be518c87
sayakpaul
sayakpaul sayakpaul requested a review from patrickvonplaten patrickvonplaten 2 years ago
yiyixuxu
yiyixuxu commented on 2024-01-30
sayakpaul Apply suggestions from code review
b8fde102
sayakpaul simplify norm_type checks in the forwards.
a0e6db10
sayakpaul add copied from statement where missing
c6c0b310
sayakpaul move _center_crop and _resize_bilinear out of the encode image function
5038c214
sayakpaul Merge branch 'main' into convert-i2vgen-xl
6ac5ec59
sayakpaul
yiyixuxu
yiyixuxu approved these changes on 2024-01-31
DN6 update
6d7fb879
DN6 Merge branch 'main' into convert-i2vgen-xl
2c1caeaa
DN6
sayakpaul
patrickvonplaten
sayakpaul
DN6 clean up
513ab1f6
yiyixuxu
DN6 update
13fcc20b
sayakpaul
DN6 update
7b7f0751
DN6
sayakpaul
sayakpaul change checkpoints.
fe50995d
patrickvonplaten
patrickvonplaten commented on 2024-01-31
patrickvonplaten
patrickvonplaten commented on 2024-01-31
patrickvonplaten
patrickvonplaten commented on 2024-01-31
patrickvonplaten
patrickvonplaten commented on 2024-01-31
patrickvonplaten
patrickvonplaten commented on 2024-01-31
patrickvonplaten
patrickvonplaten approved these changes on 2024-01-31
yiyixuxu yiyixuxu merged 04cd6adf into main 2 years ago
yiyixuxu yiyixuxu deleted the convert-i2vgen-xl branch 2 years ago
yiyixuxu
yiyixuxu commented on 2024-02-01
vladmandic
sayakpaul
vladmandic
yiyixuxu
yiyixuxu commented on 2024-02-05

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone