SemanticDiff

pytorch
d9227bb3 - Target 4096 blocks instead of split to large grid for large reduction (#35997)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

4 years ago

Target 4096 blocks instead of split to large grid for large reduction (#35997) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35997 When the number of blocks is large enough, we are already achieving blalanced SM allocation. But we still should keep the number of inputs per thread large, because thread reduce is cheap. Benchmark for Half on V100: https://github.com/zasdfgbnm/things/blob/master/2020Q2/reduction-benchmark.ipynb On large tensor, it is: 1.37ms vs 1.25ms Test Plan: Imported from OSS Differential Revision: D20927533 Pulled By: ngimel fbshipit-source-id: 40df52e439cc1c01cda66c6195b600f301c5e984

Author

zasdfgbnm

zasdfgbnm

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading