feat: abstract weight transfer backend (gdr/http) for VLLMRollout
- Rename nccl -> gdr (GPU-direct) to decouple from NCCL specifically.
- Add weight_transfer_backend to RolloutConfig: auto | gdr | http.
- auto mode detects NCCL availability and falls back to HTTP.
- HTTP path sends base64-encoded tensor data per parameter.
- GDR path uses StatelessProcessGroup + NCCL broadcast (unchanged).
Signed-off-by: Guokai Ma <guokai.ma@intel.com>