Enable dynamic resolution input for Swin Transformer and variants (#30656)
* add interpolation of positional encoding support to swin
* add style changes
* use default image processor and make size a dictionary
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove logits testing
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Refactor image size validation logic when interpolation is disabled
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove asserts in modeling
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add dynamic resolution input support to swinv2
* change size to ensure interpolation encoding path is triggered
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* set interpolate_pos_encoding default value to False
* add dynamic resolution input to donut swin
* add dynamic resolution input to maskformer swin
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>