SemanticDiff

pytorch
3e44880d - Modify TileOp GPU implementation to expose more concurrency and better utilize GPU memory bandwidth (#17275)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

5 years ago

Modify TileOp GPU implementation to expose more concurrency and better utilize GPU memory bandwidth (#17275) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17275 Previous implementation used a memcpy inside the kernel. It is more efficient to reduce the data fetched per thread to a single word from memory. This exposes more concurrency and takes advantage of GPU memory coalescing support. Reviewed By: takatosp1 Differential Revision: D14120147 fbshipit-source-id: c4734003d4342e55147c5b858f232a006af60b68

Author

wesolwsk

wesolwsk

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading