[CIR][CUDA] Support device-side printf for NVPTX (#196573)
Implement device-side printf lowering for NVPTX targets in CIR codegen.
The variadic arguments are packed into a stack-allocated struct and
passed to vprintf, matching the classic codegen behavior in
CGGPUBuiltin.cpp
When the target triple is NVPTX and the builtin is
printf/__builtin_printf, we route to emitNVPTXDevicePrintfCallExpr
The no-varargs case passes a null pointer directly.
AMDGCN device printf remains NYI.
part of https://github.com/llvm/llvm-project/issues/179278