Make DeviceCachingAllocator's error handling more defensive and a bit easier to read (#51158)
Summary:
^
Currently, `alloc_block`'s error handling has a couple (imo) minor flaws. It might clear the error state even if the error had nothing to do with memory allocation. It might also clear the error state even if it didn't attempt a cudaMalloc, meaning it might clear an error state that came from some completely unrelated earlier cuda call.
The diffs and comments are the best explanation of my preferred (new) error-checking policy.
The diffs add very little work to the common (successful, allocation satisfied by existing block) hot path. Most of the additional logic occurs in `alloc_block`, which is a slow path anyway (it tries cudaMalloc).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51158
Reviewed By: malfet, heitorschueroff
Differential Revision: D26101515
Pulled By: ezyang
fbshipit-source-id: 6b447f1770974a04450376afd9726be87af83c48