🚨 Refactor DETR to updated standards (#41549)
* refactor attention and many other
* remove return_dict interface
* improve variable names
* use _can_record_outputs and add real support for pixel and queries masks, vision input embeds and query embeds
* split self attention and cross attention
* nits
* standardize mask handling
* update DetrMHAttentionMap
* refactor mlp detr
* make style
* remve outdated tests which used ".bin" checkpoints
* Updates modeling_detr to newest standards
* Review + fix detr weight conversion
* replace einsum, reorder
* refactor rt_detr rt_detr_v2 d_fine to updated library standards
* fix repo
* Fix test_reverse_loading_mapping test
* use modular for RT-DETR
* refactor conditional and deformable detr
* use modular for deformable_detr
* use modular for conditional_detr
* fix repo
* refactor DetrMaskHeadSmallConv
* fix modular
* Temporarily remove outdated copied from
* fix consistency
* Improve DetrMHAttentionMap
* Fix torch functional import aliases
* fix after merge with main
* Fix missing copyrights
* Refactor HybridEncoder rt_detr
* tie weights fix deformable detr
* Refactor pp docs layout + fix fp16 overflow
* fix modular
* fix modular
* fix deformable detr tests