[core] Allegro T2V (#9736)
* update
* refactor transformer part 1
* refactor part 2
* refactor part 3
* make style
* refactor part 4; modeling tests
* make style
* refactor part 5
* refactor part 6
* gradient checkpointing
* pipeline tests (broken atm)
* update
* add coauthor
Co-Authored-By: Huan Yang <hyang@fastmail.com>
* refactor part 7
* add docs
* make style
* add coauthor
Co-Authored-By: YiYi Xu <yixu310@gmail.com>
* make fix-copies
* undo unrelated change
* revert changes to embeddings, normalization, transformer
* refactor part 8
* make style
* refactor part 9
* make style
* fix
* apply suggestions from review
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update example
* remove attention mask for self-attention
* update
* copied from
* update
* update
---------
Co-authored-by: Huan Yang <hyang@fastmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>