🚨 Modeling changes for export, compile, and hybrid-attention standardization (#46738)
* model fixes for export compatibility
* add minicpmv vision utils
* can compile
* make ssms fully compilable
* more ssms
* address new comments
* fast ci fixes
* fix minimax
* fix the last falcon fast ci failure
* cleanup
* use can_return_tuple in patchmixer
* avoid 5D attention
* granite export fix
* address modeling comments
* update
* address more comments, unify linear attention and make more models support full graph compile
* style
* style
* modeling changes for attn impl compatibility
* fix
* fixes
* fix lfm2 conv errors
* address most of anton's review comments
* address cyril's comments
* revert conv rename
* simpler modular
* keep max_batch_size check
* fixes
* fix moshi cuda graphs
* added layer type class attr in mixers
* fix repo
* address comments
* address comments
* Merge branch 'main' into hf-exporters-models
* fix merge
* claude review findings
* docs comments
* address comments
* style
* claude review
* small fix caught by copilot