Refactor ViT-like models (#39816)
* refactor vit
* fix
* fixup
* turn off FX tests
* AST
* deit
* dinov2
* dinov2_with_registers
* dpt
* depth anything (nit)
* depth pro (nit)
* ijepa
* ijepa (modular)
* prompt_depth_anything (nit)
* vilt (nit)
* zoedepth (nit)
* videomae
* vit_mae
* vit_msn
* vivit
* yolos
* eomt
* vitpose
* update auto backbone
* disable `fx` and export tests (dnov2, dpt, ijepa, vit, vitpose)
* fix kwargs for backbone
* fix
* convnext
* fixup
* update convnext layernorm
* fix-copies layer_norm
* convnextv2
* explicit output_hidden_states for models with backbones
* explicit hidden states collection for dinov2
* tests fixed
* fix DPT as well
* fix dinov2 with registers
* add comment