mtmd: support "frame merge" for qwen-vl-based models (#21858)
* feat: add video support for Qwen3.5
* various clean up
* revise the design
* fix llava-uhd case
* nits
* nits 2
---------
Co-authored-by: andrewmd5 <1297077+andrewmd5@users.noreply.github.com>