Add `LongT5` model #16792
stancld
changed the title [WIP] Add `LongT5` model Add `LongT5` model 3 years ago
stancld
marked this pull request as ready for review 3 years ago
sgugger
approved these changes
on 2022-04-22
stancld
force pushed
from
b025187a
to
785fb064
3 years ago
Initial commit
e4078c29
Make some fixes
52ef7a96
Make PT model full forward pass
87840bdb
Drop TF & Flax implementation, fix copies etc
c5346957
Add Flax model and update some corresponding stuff
478505d4
Drop some TF things
978a5164
Update config and flax local attn
37c44945
Add encoder_attention_type to config
79669f08
.
96bfb6b0
Update docs
7e38092a
Do some cleansing
93378287
Fix some issues -> make style; add some docs
d0d4043c
Fix position_bias + mask addition + Update tests
23f115b2
Fix repo consistency
a407e846
Fix model consistency by removing flax operation over attn_mask
a8c7940d
[WIP] Add PT TGlobal LongT5
48b85cf2
.
592590cd
[WIP] Add flax tglobal model
7b8332c7
[WIP] Update flax model to use the right attention type in the encoder
7c1f3786
Fix flax tglobal model forward pass
dec32c6f
Make the use of global_relative_attention_bias
9dc07a1a
Add test suites for TGlobal model
fddb3268
Fix minor bugs, clean code
7488044e
Fix pt-flax equivalence though not convinced with correctness
e991707f
Fix LocalAttn implementation to match the original impl. + update REA…
efd24451
Few updates
619595ba
Update: [Flax] improve large model init and loading #16148
47dc3906
Add ckpt conversion script accoring to #16853 + handle torch device p…
6ba02815
Minor updates to conversion script.
c430df4d
Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM
4598a28e
gpu support + dtype fix
93d3982f
Apply some suggestions from code review
82a99c28
* Remove (de)parallelize stuff
8b98746a
Remove caching logic for local & tglobal attention
877c51ca
Apply another batch of suggestions from code review
7cd41051
Fix converting script + revert config file change
d19c593f
Fix LocalAttn implementation to match the original impl. + update REA…
efd24451
Revert "Remove caching logic for local & tglobal attention"
eff2511d
Stash caching logic in Flax model
d41e10ae
Move test files to the proper place
e0b3e7b5
fix _make_global_fixed_block_ids and masked neg value
f33298b3
update flax model
ee2e08ea
style and quality
e29b7b6f
fix imports
b2f6c809
remove load_tf_weights_in_longt5 from init and fix copies
05b15968
add slow test for TGlobal model
e9696dd3
typo fix
ca92e712
Merge branch 'main' into new_model/LongT5
085da427
Drop obsolete is_parallelizable and one warning
70276d96
Update __init__ files to fix repo-consistency
6a903e32
fix pipeline test
b7c68d09
Fix some device placements
90857ce0
Merge branch 'main' into new_model/LongT5
bdef4d87
Merge branch 'main' into new_model/LongT5
b2a6ae2a
[wip]: Update tests -- need to generate summaries to update expected_…
9a043798
Fix quality
ac8ac232
Update LongT5 model card
a3717489
Update (slow) summarization tests
7c812266
make style
9a3b2818
rename checkpoitns
eb15125e
Merge branch 'main' of https://github.com/huggingface/transformers in…
1163d5d8
finish
832b3d8c
fix flax tests
7aac4313
Merge branch 'main' into new_model/LongT5
b6b38bde
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub