SemanticDiff

pytorch
7ffdd765 - [TensorExpr] more convenient outer Rfactor output (#40050)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

4 years ago

[TensorExpr] more convenient outer Rfactor output (#40050) Summary: Auto fuse the output loops of outer Rfactors, so it is in a more convenient format for binding GPU axes. An example: ``` Tensor* c = Reduce("sum", {}, Sum(), b, {{m, "m"}, {n, "n"}, {k, "k"}}); LoopNest loop({c}); std::vector<For*> loops = loop.getLoopStmtsFor(c); auto v = loops.at(0)->var(); loop.rfactor(c->body(), v); ``` Before: ``` { Allocate(tmp_buf, float, {m}); sum[0] = 0.f; for (int m_1 = 0; m_1 < m; m_1++) { tmp_buf[m_1] = 0.f; } for (int m_1 = 0; m_1 < m; m_1++) { for (int n = 0; n < n_1; n++) { for (int k = 0; k < k_1; k++) { tmp_buf[m_1] = (tmp_buf[m_1]) + (b[((n_1 * m_1) * k_1 + k) + k_1 * n]); } } } for (int m_1 = 0; m_1 < m; m_1++) { sum[0] = (sum[0]) + (tmp_buf[m_1]); } Free(tmp_buf); } ``` After: ``` { sum[0] = 0.f; for (int m = 0; m < m_1; m++) { Allocate(tmp_buf, float, {m_1}); tmp_buf[m] = 0.f; for (int n = 0; n < n_1; n++) { for (int k = 0; k < k_1; k++) { tmp_buf[m] = (tmp_buf[m]) + (b[((n_1 * m) * k_1 + k) + k_1 * n]); } } sum[0] = (sum[0]) + (tmp_buf[m]); Free(tmp_buf); } } ``` The existing Rfactor tests cover this case, although I did rename a few for clarity. This change broke the LLVMRFactorVectorizedReduction test because it now does what its intending to (vectorize a loop with a reduction in it) rather than nothing, and since that doesn't work it correctly fails. I've disabled it for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40050 Reviewed By: ZolotukhinM Differential Revision: D22605639 Pulled By: nickgg fbshipit-source-id: e359be53ea62d9106901cfbbc42d55d0e300e8e0

Author

nickgg

nickgg

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading