owlvit/2 dynamic input resolution (#34764)
* owlvit/2 dynamic input resolution.
* adapt box grid to patch_dim_h patch_dim_w
* fix ci
* clarify variable naming
* clarify variable naming..
* compute box_bias dynamically inside box_predictor
* change style part of code
* [run-slow] owlvit, owlv2