pytorch
8faa1448 - [CUDA graphs] Changes batchnorm to increment num_batches_tracked in place for improved graph safety (#70444)

Commit View On GitHub

Commit

2 years ago

[CUDA graphs] Changes batchnorm to increment num_batches_tracked in place for improved graph safety (#70444) Summary: This PR was not my worst debugging annoyance, nor my smallest in lines changed, but it has the highest `debugging annoyance/lines changed` ratio. The current pattern ``` self.num_batches_tracked = self.num_batches_tracked + 1 ``` , if captured, deletes an eagerly-allocated tensor and overwrites it with a captured tensor. Replays read from the (deallocated) original tensor's address. This can cause 1. an IMA on graph replay 2. failure to actually increment `num_batches_tracked` during graph replay, because every replay reads from the old location without adding to it 3. numerical corruption if the allocator reassigns the original tensor's memory to some unrelated tensor 4. combinations of 1, 2, and 3, depending on global allocation patterns and if/when the BN module is called eagerly sometimes between replays (ask me how I know). Pull Request resolved: https://github.com/pytorch/pytorch/pull/70444 Reviewed By: albanD Differential Revision: D33342203 Pulled By: ngimel fbshipit-source-id: 5f201cc25030517e75af010bbaa88c452155df21

Author

mcarilli

Committer

wconstab

Parents

58045720

pytorch 8faa1448 - [CUDA graphs] Changes batchnorm to increment num_batches_tracked in place for improved graph safety (#70444)

Commit

pytorch
8faa1448 - [CUDA graphs] Changes batchnorm to increment num_batches_tracked in place for improved graph safety (#70444)