SemanticDiff

pytorch
6458a6f0 - Tensor Iterator loop unrolling (#17667)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

5 years ago

Tensor Iterator loop unrolling (#17667) Summary: Modified Tensor Iterator gpu reduction kernel. Creating multiple accumulator during thread reduce, this removes data dependency between unrolled loops, expose instruction level parallelism that benefits latency bounded kernels (e.g. welford used by `torch.std`) This approach increases register usage, such that we need to tune unrolling factors to prevent register spilling. Current implementation tune down the unrolling factor to 2 for welford (register heavy kernel), while keeping it unchanged (4) for the rest of reduction kernels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17667 Differential Revision: D14368325 Pulled By: umanwizard fbshipit-source-id: 9d64c0dccabdb1b7c3922a6557224af704a1974e

Author

jjsjann123

jjsjann123

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading