[flang][cuda] Widen stream argument to i64 in stream intrinsic lowering (#196650)
`genCUDASetDefaultStream` and `genCUDAStreamDestroy` build their runtime
call with an `i64` stream parameter but pass the actual argument
straight through, so a smaller-kind actual (e.g. the literal `0` in
`cudaforSetDefaultStream(0)`) produces an ill-typed `fir.call`:
```
error: 'llvm.call' op operand type mismatch for operand 0: 'i32' != 'i64'
```
Insert a `fir.convert` to `i64` before the call, matching what
`genCUDASetDefaultStreamArray` already does.