SemanticDiff

pytorch
bd7db8bf - Use non-blocking copy for creation of lazy tensors in TS backend impl (#69397)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

2 years ago

Use non-blocking copy for creation of lazy tensors in TS backend impl (#69397) - blocking here makes cpu tracing thread wait for the transfer and cuda sync, before continuing on to trace more stuff. we want to overlap cpu tracing with doing the copy in the background - blocking is not required if the tensor is already on the cuda device, but it is if the tensor is on the cpu device since the cpu thread could modify the tensor while it is being copied asynchronously. - we make an exception for numel()=1 tensors: doing a .to (nonblocking) cpu to cuda is potentially dangerous even for single-elem tensors, but fill_ on cuda tensors is an async operation, and .item() on cpu singletons tensors is fast

References

#69397 - Use non-blocking copy for creation of lazy tensors in TS backend impl

Author

wconstab

wconstab

Parents

FAQ Terms Privacy Refunds Impressum

Loading