check error status of CUDA launch after Magma kernels (#29003)
Summary:
as part of https://github.com/pytorch/hub/issues/62 I found that the stack-trace of a failed kernel launch was being recorded elsewhere, even with CUDA_LAUNCH_BLOCKING=1.
So, I started debugging, and found that magma launches don't do error checking.
I eventually found the issue to be that I didn't compile-in sm37 SASS into the magma binary and the failure was on `x.inverse()`, and that's somehow a problem for magma 2.5.1 (but not 2.5.0).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29003
Differential Revision: D18397358
Pulled By: soumith
fbshipit-source-id: 04baca68eac209d7af773daddd0193697d4ab0d9