Adding grounding dino (#26087)
* Fixed typo when converting weigths to GroundingDINO vision backbone
* Final modifications on modeling
* Removed unnecessary class
* Fixed convert structure
* Added image processing
* make fixup partially completed
* Now text_backbone_config has its own class
* Modified convert script
* Removed unnecessary config attribute
* Added new function to generate sub sentence mask
* Renamed parameters with gamma in the name as it's currently not allowed
* Removed tokenization and image_processing scripts since we'll map from existing models
* Fixed some issues with configuration
* Just some modifications on conversion script
* Other modifications
* Copied deformable detr
* First commit
* Added bert to model
* Bert validated
* Created Text and Fusion layers for Encoder
* Adapted Encoder layer
* Fixed typos
* Adjusted Encoder
* Converted encoder to hf
* Modified Decoder Layer
* Modified main decoder class
* Removed copy comments
* Fixed forward from GroundingDINOModel and GroundingDINODecoder
* Added all necessary layers, configurations and forward logic up to GroundingDINOModel
* Added all layers to convertion
* Fixed outputs for GroundingDINOModel and GroundingDINOForObjectDetection
* Fixed mask input to encoders and fixed nn.MultiheadAttention batch first and attn output
* Fixed forward from GroundingDINOTextEnhancerLayer
* Fixed output bug with GroundingDINODeformableLayer
* Fixed bugs that prevent GroundingDINOForObjectDetection to run forward method
* Fixed attentions to be passed correctly
* Passing temperature arg when creating Sine position embedding
* Removed copy comments
* Added temperature argument for position embedding
* Fixed typo when converting weigths to GroundingDINO vision backbone
* Final modifications on modeling
* Removed unnecessary class
* Fixed convert structure
* Added image processing
* make fixup partially completed
* Now text_backbone_config has its own class
* Modified convert script
* Removed unnecessary config attribute
* Added new function to generate sub sentence mask
* Renamed parameters with gamma in the name as it's currently not allowed
* Removed tokenization and image_processing scripts since we'll map from existing models
* Fixed some issues with configuration
* Just some modifications on conversion script
* Other modifications
* Fix style
* Improve fixup
* Improve conversion script
* Improve conversion script
* Add GroundingDINOProcessor
* More improvements
* Return token type ids
* something
* Fix more tests
* More improvements
* More cleanup
* More improvements
* Fixed tests, improved modeling and config
* More improvements and fixing tests
* Improved tests and modeling
* Improved tests and added image processor
* Improved tests inference
* More improvements
* More test improvements
* Fixed last test
* Improved docstrings and comments
* Fix style
* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
* Better naming
* Better naming
* Added Copied statement
* Added Copied statement
* Moved param init from GroundingDINOBiMultiHeadAttention
* Better naming
* Fixing clamp style
* Better naming
* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/grounding_dino/configuration_grounding_dino.py
Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
* Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py
Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
* Improving conversion script
* Improved config
* Improved naming
* Improved naming again
* Improved grouding-dino.md
* Moved grounding dino to multimodal
* Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py
Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
* Fixed docstrings and style
* Fix docstrings
* Remove timm attributes
* Reorder imports
* More improvements
* Add Grounding DINO to pipeline
* Remove model from check_repo
* Added grounded post_process to GroundingDINOProcessor
* Fixed style
* Fixed GroundingDINOTextPrenetConfig docstrings
* Aligned inputs.keys() when both image and text are passed with model_input_names
* Added tests for GroundingDINOImageProcessor and GroundingDINOProcessor
* Testing post_process_grounded_object_detection from GroundingDINOProcessor at test_inference_object_detection_head
* Fixed order
* Marked test with require_torch
* Temporarily changed repo_id
* More improvements
* Fix style
* Final improvements
* Improve annotators
* Fix style
* Add is_torch_available
* Remove type hints
* vocab_tokens as one liner
* Removed print statements
* Renamed GroundingDINOTextPrenetConfig to GroundingDINOTextConfig
* remove unnecessary comments
* Removed unnecessary tests on conversion script
* Renamed GroundingDINO to camel case GroundingDino
* Fixed GroundingDinoProcessor docstrings
* loading MSDA kernels in the modeling file
* Fix copies
* Replace nn.multiheadattention
* Replace nn.multiheadattention
* Fixed inputs for GroundingDinoMultiheadAttention & order of modules
* Fixed processing to avoid messing with inputs
* Added more tips for GroundingDino
* Make style
* Chaning name to align with SAM
* Replace final nn.multiheadattention
* Fix model tests
* Update year, remove GenerationTesterMixin
* Address comments
* Address more comments
* Rename TextPrenet to TextModel
* Rename hidden_states
* Address more comments
* Address more comments
* Address comment
* Address more comments
* Address merge
* Address comment
* Address comment
* Address comment
* Make style
* Added layer norm eps to layer norms
* Address more comments
* More fixes
* Fixed equivalence
* Make fixup
* Remove print statements
* Address comments
* Address comments
* Address comments
* Address comments
* Address comments
* Address comments
* Add comment
* Address comment
* Remove overwriting of test
* Fix bbox_embed
* Improve decoder_bbox_embed_share
* Simplify outputs
* Updated post_process_grounded_object_detection
* Renamed sources to feature_maps
* Improved tests for Grounding Dino ImageProcessor and Processor
* Fixed test requirements and imports
* Fixed image_processing
* Fixed processor tests
* Fixed imports for image processing tests
* Fix copies
* Updated modeling
* Fix style
* Moved functions to correct position
* Fixed copy issues
* Update src/transformers/models/deformable_detr/modeling_deformable_detr.py
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
* Keeping consistency custom cuda kernels for MSDA
* Make GroundingDinoProcessor logic clearer
* Updated Grounding DINO checkpoints
* Changed tests to correct structure
* Updated gpu-cpu equivalence test
* fix copies
* Update src/transformers/models/grounding_dino/processing_grounding_dino.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/grounding_dino/processing_grounding_dino.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/grounding_dino/configuration_grounding_dino.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fixed erros and style
* Fix copies
* Removed inheritance from PreTrainedModel from GroundingDinoTextModel
* Fixed GroundingDinoTextModel
* Fixed type of default backbone config
* Fixed missing methods for GroundingDinoTextModel and Added timm support for GroundingDinoConvEncoder
* Addressed comments
* Addressed batched image processing tests
* Addressed zero shot test comment
* Addressed tip comment
* Removed GroundingDinoTextModel from check_repo
* Removed inplace masking
* Addressed comments
* Addressed comments
* Addressed comments
* Fix copies
* Fixing timm test
* Fixed batching equivalence test
* Update docs/source/en/model_doc/grounding-dino.md
Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com>
* Update docs/source/en/model_doc/grounding-dino.md
Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com>
* Update docs/source/en/model_doc/grounding-dino.md
Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com>
* Addressed more comments
* Added a new comment
* Reduced image size
* Addressed more comments
* Nits
* Nits
* Changed the way text_config is initialized
* Update src/transformers/models/grounding_dino/processing_grounding_dino.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
---------
Co-authored-by: Niels <niels.rogge1@gmail.com>
Co-authored-by: Rafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Eduardo Pacheco <eduardo.pacheco@limehome.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Tianqi Xu <40522713+dandansamax@users.noreply.github.com>