SemanticDiff

pytorch
8b49efe8 - tune elementwise for AMD uarch (#16217)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

5 years ago

tune elementwise for AMD uarch (#16217) Summary: Tune elementwise kernel for AMD architectures by increasing the work group sizes and launch bounds. This change improves training throughput for torchvision models by up to 11% in our tests while exhibiting no significant performance regression. No functional/performance change for CUDA - just shifting numbers into constrexpr. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16217 Differential Revision: D13776684 Pulled By: bddppq fbshipit-source-id: edbaebe904598b2de66a9e9a68a1aa219ebc01e9

Author

iotamudelta

iotamudelta

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading