PR #6665 [Feat] add I2VGenXL for image-to-video generation

let's see

bb7c4121

better conditioning for class_embed_type

d537b6c5

determine in_channels programatically.

15f16071

worse condition

a329b73e

fix: sample_size.

5660ba1e

Merge branch 'main' into convert-i2vgen-xl

eb8ea72b

separte script for i2vgen

011329d2

changes

3e0015d0

fix: basic transformer block init.

f09c2dd7

check

7dd0cb02

revert block_out_channels.

d6f1e6d7

debug info

da5b83c0

debug info

0ecef35c

debug info

13ecc11a

debug

6778b3f8

correct ffn inner dim

20aeaf3b

debug info

ef85c84a

input channels should be 8./

7d031624

input channels corrected

a7366940

Revert "input channels corrected"

34e7349b

better input channels

896d626a

Revert "better input channels"

02b76b5e

rectify conversion script

15a6fbde

conversion script.

5a097226

conversio

bcccfdfd

push_to_hub

1c68e056

remove print

3b5940b8

let's see.

aaae0320

safeguard .

1c72370d

device place,ent

25527f86

comment to remind that writing good code is important

c38ef7af

device placement.

4f4d4e6a

corrct layernorm condition.

e717630f

norm3 condition

292668ee

correct norm3

17a2418e

incorporate einops

d5b76930

image_embeddings

35b15f28

okay

693b2cee

dtype debug

642cbe4b

dtype fix

105ecc55

dtype fix.

d7e6b2cd

simplify code.

5de43480

remove print

2852de16

debug

76772c5b

debug

600ffd85

debug

ecf0070c

debug

b88d9a9f

debu

87eff5ed

debug

3178e742

remove print

32f6151d

add: dummy pipeline implementation too.

87e70abc

pipeline draft

5e7f17ff

complete conversion script.

28b9d57f

add new unet to modules

7943c91f

enable chunked decoding on vae.

7f3d5593

correct image latent behaviour

26d87c29

remove comment

5d03574f

correct dtyp

989c707a

correct output type.

7b88ad37

Merge branch 'main' into convert-i2vgen-xl

eec8791c

init fix

6ff96068

fix-copies

b44e0532

Merge branch 'main' into convert-i2vgen-xl

51fdf304

chunked decoding should be optional

9bd5f16c

what happens if we take mode instead?

cc7e9754

fix: type

734274ad

back to sampling and clean up tensorification

b48f0945

better variable name

761c08ec

try to follow the original implementation closely.

a0c00c02

proper repeatation

c6d35e2d

fix: fps condition check

0ebad2e9

fix: masking

da309d5b

fix: masking

5b0b5dfc

go back to negative_image_image_latents.

ef4dd348

make type casting for fps explicit

670488e1

original implementation image_latents.

80a6f1a7

Revert "original implementation image_latents."

85d364c2

sinusoidal embedding?

b0865dd6

simple bilinear resizing.

87742e92

remove the sinusoidal implementation from i2vgenxl

89024408

resolve conflicts

e9cd8397

harmonize with main

585a6b6e

fix: tensor2vid

90d91a8c

fix: tensor2vid

4a7d4aee

fix: tensor2vid

ab9569f8

fix: doc

58844fe1

fix model offload sequence.

11fd6469

yiyixuxu added video

update

6778e6b9

update

9f737924

add docs

0ecd79b8

update

eefa6ccb

update

a9fecb33

update

27817919

update

0d1ea8c4

update

1d3846d4

update

db0213a1

improve docs.

f2964ba8

docstring to the pipeline

2d5071e5

licensing in the pipeline scripts.

4cd00836

clean up the docstring of the UNet.

6012362d

Merge branch 'main' into convert-i2vgen-xl

09519a1e

sayakpaul commented on 2024-01-30

make _resize_bilinear and _center_crop_wide accept torch tensors as w…

23935a97

data type fix

57b20ee1

unint8 > uint8

4d51fe8a

channels_last

14404b2a

debug

24d813e5

fix download path for the example image

bf1eb40f

fix: download path again

f35f3d8a

use cross_attention_dim to initialize

28804420

debug

698f9c11

debu

68cbe593

reduce hidden size of the vision encoder

45c682ec

go

3a701a21

debug more

0a4c6866

reduce more hidden dim

758acc0f

remove callback and callback_steps from required params check

0bfd0427

remove print

dd5a8f04

assertions for the default case..

50d46062

skip test_attention_slicing_forward_pass as it's depcrecated.

2a4c7272

feature_extractor.

66034c52

feature_extractor.

b1819cda

relax precision

48e7694d

relax more.

0f230f9c

torch.manual_seed(0)

836fb678

relax precision

a5cb5b1a

uncomment batching tests

947e63ad

debug

216b9ddc

debug more

7c810525

debug more

31409afd

make the pt to pil utilities better

2faaffed

debug

a29c2011

format string

4adc8519

okay

9cb0b846

force_feature_extractor_resize

0b9a9ef7

debug

7cffe74a

expand to samples's shape

3d0ef8b4

check

ca422ef8

fix: batching behaviour for fps

cfafe51e

test_inference_batch_single_identical

d85bd2d2

relax test_inference_batch_single_identical

73242917

relax a bit more.

bb103025

test_num_videos_per_prompt

5e79f3d1

let's go.

f3c58a24

remove extra prints.

7cb384c8

remove force_feature_extractor_resize

f64c3d25

remove force_feature_extractor_resize

1675b075

fix: test_num_videos_per_prompt

8c20445d

fix: test_num_videos_per_prompt

2221bbc8

fix: test_num_videos_per_prompt

65849883

fix a bit more

7fad5851

style

61091789

add: slow test

7b281b53

flattened image slice

24cfc1d9

variant

edd6cc5e

assertion

6ebb2653

add: note about memory optimization

a03649fa

sayakpaul marked this pull request as ready for review 2 years ago

sayakpaul requested a review from

DN6 2 years ago

sayakpaul requested a review from

patrickvonplaten 2 years ago

sayakpaul requested a review from

yiyixuxu 2 years ago

being to cpu before calling numpy()

c78617ee

finish slow test fixes

76a42b38

Merge branch 'main' into convert-i2vgen-xl

ca4a977a

Empty-Commit

182447af

pin peft dependencies.

a54facc0

sayakpaul commented on 2024-01-30

patrickvonplaten commented on 2024-01-30

sayakpaul commented on 2024-01-30

remove attention slicing and unload_lora

72e466e2

remove attention mask

5ba9b4ad

timsteps.

693d8270

add missing entries in the unet docstring

88f03a39

Apply suggestions from code review

9bf706e4

remove textual inversion

a682dca0

remove _to_tensor on fps.

a9c23e8b

leverage VaeImageProcessor.

ec5694ad

remove unnecessary config vars.

e6c07b56

use num_attention_heads

8c980df1

clean up conv_out layer creation

5cbaf2e7

refactor attention logic for cleaning up norm handling

be518c87

sayakpaul requested a review from

patrickvonplaten 2 years ago

yiyixuxu commented on 2024-01-30

Apply suggestions from code review

b8fde102

simplify norm_type checks in the forwards.

a0e6db10

add copied from statement where missing

c6c0b310

move _center_crop and _resize_bilinear out of the encode image function

5038c214

Merge branch 'main' into convert-i2vgen-xl

6ac5ec59

yiyixuxu approved these changes on 2024-01-31

update

6d7fb879

Merge branch 'main' into convert-i2vgen-xl

2c1caeaa

clean up

513ab1f6

update

13fcc20b

update

7b7f0751

change checkpoints.

fe50995d

patrickvonplaten commented on 2024-01-31

patrickvonplaten approved these changes on 2024-01-31

yiyixuxu merged 04cd6adf into main 2 years ago

yiyixuxu deleted the convert-i2vgen-xl branch 2 years ago

yiyixuxu commented on 2024-02-01

yiyixuxu commented on 2024-02-05

diffusers
[Feat] add I2VGenXL for image-to-video generation
#6665

Merged

[Feat] add I2VGenXL for image-to-video generation #6665

diffusers [Feat] add I2VGenXL for image-to-video generation #6665 Merged

[Feat] add I2VGenXL for image-to-video generation #6665

diffusers
[Feat] add I2VGenXL for image-to-video generation
#6665

Merged