pytorch
ce6e6812 - use legacy unrolled kernel for non-trivial offset calc cases (#71710)

Commit
2 years ago
use legacy unrolled kernel for non-trivial offset calc cases (#71710) Summary: This leads to across the board improvements on Pascals, big perf improvements for some broadcasting patterns and datatypes on V100 (along with some 3-5% regressions for some other patterns). The most common improving pattern on V100 is half-precision x+bias, that improves by ~5%. Full V100 results in https://docs.google.com/spreadsheets/d/1K67x-6_TPT9Yt6533NfECEhUyfbqBxLH9M5Z3gymzXE/edit#gid=1218963246, benchmarking script in https://gist.github.com/ngimel/986ee84a1dd234a0485e99544e0fc8b6 Most importantly, it reduces context size by 40 MB. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71710 Reviewed By: mruberry Differential Revision: D33769330 Pulled By: ngimel fbshipit-source-id: 5a7942261e06003ca79bfa3b071106aab1a8a4bc (cherry picked from commit f9b51b48112b25353c928711974537a0792516c8)
Author
Natalia Gimelshein
Committer
Parents
Loading