Improve error checking of CUDALoops. (#38810)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38810
Same change as was applied to CPU loops -- separate out checking of the inputs and outputs.
Test Plan: Imported from OSS
Differential Revision: D21670339
Pulled By: gchanan
fbshipit-source-id: 42f208538dce1a5598d14948d8d02a1c91ba152a