SemanticDiff

pytorch
2054cd56 - Optimize relu on cpu using clamp_min (#50924)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

3 years ago

Optimize relu on cpu using clamp_min (#50924) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50924 `clamp_min` seems slightly faster than `threshold` (on avx2 cpus) because it compiles down to vmaxps, rather than vcmpps+vblendv. I see the biggest perf difference (about 20% faster) with float tensors at 32k-64k elements. Bigger tensors are more memory bound although it looks like it might still be a tiny win (2%). Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26009829 Pulled By: bertmaher fbshipit-source-id: 7bb1583ffb3ee242e347f59be82e0712c7631f7e

Author

bertmaher

bertmaher

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading