[Qwen3VL] fix device mismatch error for FSDP2 training (#41536)
For FSDP2, parameters might be on a meta device, and the weight.device attribute may
not accurately reflect where the actual computation will happen during forward passes.
```log
File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 776, in forward
pos_embeds = self.fast_pos_embed_interpolate(grid_thw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 745, in fast_pos_embed_interpolate
pos_embeds = self.pos_embed(idx_tensor) * weight_tensor[:, :, None]
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1879, in _call_impl
return inner()
^^^^^^^
File "torch/nn/modules/module.py", line 1827, in inner
result = forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/sparse.py", line 192, in forward
return F.embedding(
^^^^^^^^^^^^
File "torch/nn/functional.py", line 2546, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__index_select)
```
https://github.com/volcengine/verl/pull/3686#issuecomment-3380981817
Signed-off-by: Hollow Man <hollowman@opensuse.org>