SemanticDiff

pytorch
6ca5421a - Enable non-synchronizing cub scan for cum* operations (#42036)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

4 years ago

Enable non-synchronizing cub scan for cum* operations (#42036) Summary: This uses cub for cum* operations, because, unlike thrust, cub is non-synchronizing. Cub does not support more than `2**31` element tensors out of the box (in fact, due to cub bugs the cutoff point is even smaller) so to support that I split the tensor into `2**30` element chunks, and modify the first value of the second and subsequent chunks to contain the cumsum result of the previous chunks. Since modification is done inplace on the source tensor, if something goes wrong and we error out before the source tensor is reverted back to its original state, source tensor will be corrupted, but in most cases errors will invalidate the full coda context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42036 Reviewed By: ajtulloch Differential Revision: D22749945 Pulled By: ngimel fbshipit-source-id: 9fc9b54d466df9c8885e79c4f4f8af81e3f224ef

Author

ngimel

ngimel

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading