pytorch
06d978a9 - [c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249)

Commit View On GitHub

Commit

4 years ago

[c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42249 Main change is to bring Caffe2's superior error messages for cuda initialization into c10 and use them in all code paths. Basic logic: | Case | Call to device_count() | init_cuda, e.g. allocating tensor | | -- | -- | -- | | all good | non-zero | just works | | no gpus | 0, no warning | throw exception with good message | | driver issues | 0, produce warning | throw exception with good message | | out of memory with ASAN | 0, produce warning| throw exception with ASAN message | Previously, the error thrown from init_cuda was very generic and the ASAN warning (if any) was buried in the logs. Other clean up changes: * cache device_count() always in a static variable * move all asan macros in c10 Test Plan: Hard to unittest because of build modes. Verified manually that the behavior from the table above holds by running the following script in different modes (ASAN/no-ASAN, CUDA_VISIBLE_DEVICES=): ``` print('before import') import torch print('after import') print('devices: ', torch.cuda.device_count()) x = torch.tensor([1,2,3]) print('tensor creation') x = x.cuda() print('moved to cuda') ``` Reviewed By: ngimel Differential Revision: D22824329 fbshipit-source-id: 5314007313a3897fc955b02f8b21b661ae35fdf5

Author

Dmytro Dzhulgakov

Committer

facebook-github-bot

Parents

27e8dc78

pytorch 06d978a9 - [c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249)

Commit

pytorch
06d978a9 - [c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249)