PR #16792 Add `LongT5` model

stancld changed the title ~~[WIP] Add `LongT5` model~~ Add `LongT5` model 3 years ago

stancld marked this pull request as ready for review 3 years ago

stancld commented on 2022-04-19

patil-suraj commented on 2022-04-21

patrickvonplaten commented on 2022-04-22

sgugger approved these changes on 2022-04-22

stancld force pushed from b025187a to 785fb064 3 years ago

stancld commented on 2022-04-25

patil-suraj commented on 2022-05-17

Initial commit

e4078c29

Make some fixes

52ef7a96

Make PT model full forward pass

87840bdb

Drop TF & Flax implementation, fix copies etc

c5346957

Add Flax model and update some corresponding stuff

478505d4

Drop some TF things

978a5164

Update config and flax local attn

37c44945

Add encoder_attention_type to config

79669f08

.

96bfb6b0

Update docs

7e38092a

Do some cleansing

93378287

Fix some issues -> make style; add some docs

d0d4043c

Fix position_bias + mask addition + Update tests

23f115b2

Fix repo consistency

a407e846

Fix model consistency by removing flax operation over attn_mask

a8c7940d

[WIP] Add PT TGlobal LongT5

48b85cf2

.

592590cd

[WIP] Add flax tglobal model

7b8332c7

[WIP] Update flax model to use the right attention type in the encoder

7c1f3786

Fix flax tglobal model forward pass

dec32c6f

Make the use of global_relative_attention_bias

9dc07a1a

Add test suites for TGlobal model

fddb3268

Fix minor bugs, clean code

7488044e

Fix pt-flax equivalence though not convinced with correctness

e991707f

Fix LocalAttn implementation to match the original impl. + update REA…

efd24451

Few updates

619595ba

Update: [Flax] improve large model init and loading #16148

47dc3906

Add ckpt conversion script accoring to #16853 + handle torch device p…

6ba02815

Minor updates to conversion script.

c430df4d

Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM

4598a28e

gpu support + dtype fix

93d3982f

Apply some suggestions from code review

82a99c28

* Remove (de)parallelize stuff

8b98746a

Remove caching logic for local & tglobal attention

877c51ca

Apply another batch of suggestions from code review

7cd41051

Fix converting script + revert config file change

d19c593f

Fix LocalAttn implementation to match the original impl. + update REA…

efd24451

Revert "Remove caching logic for local & tglobal attention"

eff2511d

Stash caching logic in Flax model

d41e10ae

Move test files to the proper place

e0b3e7b5

fix _make_global_fixed_block_ids and masked neg value

f33298b3

update flax model

ee2e08ea

style and quality

e29b7b6f

patil-suraj force pushed to e29b7b6f 3 years ago

fix imports

b2f6c809

remove load_tf_weights_in_longt5 from init and fix copies

05b15968

add slow test for TGlobal model

e9696dd3

typo fix

ca92e712

patil-suraj commented on 2022-05-30

patrickvonplaten commented on 2022-05-30

Merge branch 'main' into new_model/LongT5

085da427

Drop obsolete is_parallelizable and one warning

70276d96

Update __init__ files to fix repo-consistency

6a903e32

fix pipeline test

b7c68d09

patil-suraj commented on 2022-06-03

Fix some device placements

90857ce0

Merge branch 'main' into new_model/LongT5

bdef4d87

patil-suraj approved these changes on 2022-06-09

patil-suraj requested a review from

patrickvonplaten 3 years ago

Merge branch 'main' into new_model/LongT5

b2a6ae2a

[wip]: Update tests -- need to generate summaries to update expected_…

9a043798

Fix quality

ac8ac232

Update LongT5 model card

a3717489

Update (slow) summarization tests

7c812266

make style

9a3b2818

rename checkpoitns

eb15125e

Merge branch 'main' of https://github.com/huggingface/transformers in…

1163d5d8

finish

832b3d8c

fix flax tests

7aac4313

Merge branch 'main' into new_model/LongT5

b6b38bde

patrickvonplaten merged a72f1c9f into main 3 years ago

transformers
Add `LongT5` model
#16792

Merged

Add `LongT5` model #16792

transformers Add `LongT5` model #16792 Merged

Add `LongT5` model #16792

transformers
Add `LongT5` model
#16792

Merged