Optimization: Qwen3 next autoregressive pass (#17996)
* It's Qwen3 Next, the lean mean token generation machine!
* Apply patches from thread
* Remove recurrent version, only keep chunked and autoregressive
* Remove unnecessary conts and asserts
* Remove more extra conts and asserts
* Cleanup masking