pytorch
0b255b3f - Better repr for ModuleList (#90452)

Commit View On GitHub

Commit

1 year ago

Better __repr__ for ModuleList (#90452) ## Problem When models have a lot of complex repeated layers, `print(module)` output becomes unfeasible to work with. For example, current output of `__repr__` for `t5-small` is `715 ` lines long. ## Solution Using better `__repr__` it becomes `135`. For `t5-large`, current `__repr__` prints `1411` lines. Better `__repr__` — `135`. Same numer as for t5-small, because most of the layers are just repeated. For `EleutherAI/gpt-j-6B` number of lines reduces form `483` to just `24`. Here's how it works: when ModuleList items have exactly the same `__repr__` instead of printing both of them, it prints f`N x {repr(item)}`. Current code supports cases when the same ModuleList has multiple repeating items, which is especially useful when first/last layer of a block is different from the reset of them. Better `__repr__` should make model prints smaller, more beautiful and significantly more useful by highlighting the difference between repeated blocks instead of losing it in a wall of text. ## Motivating real-life example. You can try it out in this [colab notebook](https://colab.research.google.com/drive/1PscpX_K1UemIDotl2raC4QMy_pTqDq7p?usp=sharing). Current `__repr__` of gpt-j-6b output it too big to add it to this PR description: ``` GPTJModel( (wte): Embedding(50400, 4096) (drop): Dropout(p=0.0, inplace=False) (h): ModuleList( (0): GPTJBlock( (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True) (attn): GPTJAttention( (attn_dropout): Dropout(p=0.0, inplace=False) (resid_dropout): Dropout(p=0.0, inplace=False) (k_proj): Linear(in_features=4096, out_features=4096, bias=False) (v_proj): Linear(in_features=4096, out_features=4096, bias=False) (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (out_proj): Linear(in_features=4096, out_features=4096, bias=False) ) (mlp): GPTJMLP( (fc_in): Linear(in_features=4096, out_features=16384, bias=True) (fc_out): Linear(in_features=16384, out_features=4096, bias=True) (act): NewGELUActivation() (dropout): Dropout(p=0.0, inplace=False) ) ) (1): GPTJBlock( (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True) (attn): GPTJAttention( (attn_dropout): Dropout(p=0.0, inplace=False) (resid_dropout): Dropout(p=0.0, inplace=False) (k_proj): Linear(in_features=4096, out_features=4096, bias=False) (v_proj): Linear(in_features=4096, out_features=4096, bias=False) (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (out_proj): Linear(in_features=4096, out_features=4096, bias=False) ) (mlp): GPTJMLP( (fc_in): Linear(in_features=4096, out_features=16384, bias=True) (fc_out): Linear(in_features=16384, out_features=4096, bias=True) (act): NewGELUActivation() (dropout): Dropout(p=0.0, inplace=False) ) ) (2): GPTJBlock( ... ``` Better `__repr__` output looks like this: ``` GPTJModel( (wte): Embedding(50400, 4096) (drop): Dropout(p=0.0, inplace=False) (h): ModuleList( (0-27): 28 x GPTJBlock( (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True) (attn): GPTJAttention( (attn_dropout): Dropout(p=0.0, inplace=False) (resid_dropout): Dropout(p=0.0, inplace=False) (k_proj): Linear(in_features=4096, out_features=4096, bias=False) (v_proj): Linear(in_features=4096, out_features=4096, bias=False) (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (out_proj): Linear(in_features=4096, out_features=4096, bias=False) ) (mlp): GPTJMLP( (fc_in): Linear(in_features=4096, out_features=16384, bias=True) (fc_out): Linear(in_features=16384, out_features=4096, bias=True) (act): NewGELUActivation() (dropout): Dropout(p=0.0, inplace=False) ) ) ) (ln_f): LayerNorm((4096,), eps=1e-05, elementwise_affine=True) ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90452 Approved by: https://github.com/albanD

Author

Guitaricet

Committer

pytorchmergebot

Parents

57dcd93c

pytorch 0b255b3f - Better __repr__ for ModuleList (#90452)

Commit

pytorch
0b255b3f - Better repr for ModuleList (#90452)