transformers
b3e3c3dc - [Qwen3VL] fix device mismatch error for FSDP2 training (#41536)

Commit

193 days ago

[Qwen3VL] fix device mismatch error for FSDP2 training (#41536) For FSDP2, parameters might be on a meta device, and the weight.device attribute may not accurately reflect where the actual computation will happen during forward passes. ```log File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 776, in forward pos_embeds = self.fast_pos_embed_interpolate(grid_thw) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 745, in fast_pos_embed_interpolate pos_embeds = self.pos_embed(idx_tensor) * weight_tensor[:, :, None] ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/nn/modules/module.py", line 1879, in _call_impl return inner() ^^^^^^^ File "torch/nn/modules/module.py", line 1827, in inner result = forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/nn/modules/sparse.py", line 192, in forward return F.embedding( ^^^^^^^^^^^^ File "torch/nn/functional.py", line 2546, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__index_select) ``` https://github.com/volcengine/verl/pull/3686#issuecomment-3380981817 Signed-off-by: Hollow Man <hollowman@opensuse.org>

References

#41536 - [Qwen3VL] fix device mismatch error for FSDP2 training

Author

HollowMan6

Parents

b84c0b31

transformers b3e3c3dc - [Qwen3VL] fix device mismatch error for FSDP2 training (#41536)

transformers
b3e3c3dc - [Qwen3VL] fix device mismatch error for FSDP2 training (#41536)