SemanticDiff

pytorch
bfba5c5c - Fix sync placement in some cases where it was less than optimal or wrong. (#1600)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

2 years ago

Fix sync placement in some cases where it was less than optimal or wrong. (#1600) * Fix placment of added test. * Place RAW sync at computeAt position rather than unroll position When a shared-mem tensor is unrolled, which shouldn't be common as unroll is meant to allocate enough registers for loop unrolling, its RAW sync is needed at the computeAt loop as there are consumers sharing the computeAt loop. Co-authored-by: Naoya Maruyama <nmaruyama@nvidia.com>

Author

csarofeen

csarofeen

Parents

FAQ Terms Privacy Refunds Impressum

Loading