[Build] Downgrade default CUDA version to 12.8 for PyTorch cu128 compatibility
PyTorch official PyPI wheels are built on CUDA 12.8, so pip install vllm
with a cu129 wheel pins nvidia-cublas-cu12 to 12.8 and causes a runtime
conflict (pytorch/pytorch#174949). Downgrade the default CUDA version so
the CI image, test image, and PyPI wheel all use cu128 consistently.
- Dockerfile: CUDA_VERSION 12.9.1 → 12.8.1
- requirements/test.txt: recompile lockfile with --torch-backend cu128
- .pre-commit-config.yaml: match pip-compile hook to cu128
- docker/versions.json: regenerated
Release builds (cu129, cu130) are unaffected since they pass explicit
--build-arg CUDA_VERSION overrides.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>