Manually call lazyInitCUDA in structured CUDA calls (#61882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61882
If you directly call the native implementation that bypasses the
initialization, which is bad! This probably slows things down a little
though...
Fixes problem uncovered by #61642
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D29783856
Pulled By: ezyang
fbshipit-source-id: 16857569a049e09c6ebd96ef04b0025403b254af