Refactoring code related to WARP_SIZE. (#3623)
1. Centralize its definition in common.cuh.
2. Rename it to GPU_WARP_SIZE which can be extended to AMD GPU later.
3. Centralize warp shuffle functions.
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>