[Docker] Add BUILDER_CUDA_VERSION to decouple build and runtime CUDA versions
Allow compiling csrc/ and extensions (DeepGEMM, EP kernels) with a
different CUDA toolkit than the one shipped in the final runtime image.
BUILDER_CUDA_VERSION controls the devel base image used for compilation,
while CUDA_VERSION selects the runtime base image and PyTorch wheel index.
Override the CUDA_VERSION env var inherited from the nvidia base image in
the build stages so PyTorch index URLs resolve to the runtime version.
Update the CUDA 13.0 release pipeline entries to pass the new arg.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>