pytorch
8ede828d - [te] Speed up relu on cpu

Commit
4 years ago
[te] Speed up relu on cpu Summary: We were implementing it using ifThenElse, which creates conditional branches that complicate llvm's vectorization. Using CompareSelect directly yields clean vectorized code with nothing but vmovups and vmaxps. Test Plan: Trivial benchmark shows 33% speedup on large tensors (256k elements). Reviewed By: eellison Differential Revision: D25986637 fbshipit-source-id: 72dd7776924f73c036d46dca30dff22404d86b82
Author
Parents
Loading