pytorch
d881b297 - Make autocast cache and buffer stealing aware of cudagraph static output tensors (#99368)

Commit

1 year ago

Make autocast cache and buffer stealing aware of cudagraph static output tensors (#99368) In this stack of PRs we adding caching to output tensors for cudagraph trees after we've done initial recording. On initial recording we do not cache tensor outputs because this prevents memory from being reclaimed. On subsequent exeuctions we do cache them to avoid overhead. However, because there is an extra reference around, this caused divergent recording & execution behavior in both autocast caching and autograd gradient stealing. Divergent recording & execution would keep on re-recording and eventually stabilize, but it's not what you want to see happen. This pr makes the autocast cache and buffer stealing aware of the cudagraph static output tensors. I will add this to the other cudagraph impl in another pr. Not sure if this should be in autograd or in autocast since it affects both.. Or somewhere else Pull Request resolved: https://github.com/pytorch/pytorch/pull/99368 Approved by: https://github.com/albanD, https://github.com/ezyang

Author

eellison

Committer

pytorchmergebot

Parents

3009c42e

pytorch d881b297 - Make autocast cache and buffer stealing aware of cudagraph static output tensors (#99368)

pytorch
d881b297 - Make autocast cache and buffer stealing aware of cudagraph static output tensors (#99368)