pytorch
95c00cf0 - speed up quantized relu6 inplace kernel (#68404)

Commit
3 years ago
speed up quantized relu6 inplace kernel (#68404) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68404 The qclamp kernel is equal to (non inplace) or faster (inplace) than the qrelu6 kernel. Removing the qrelu6 kernel and routing qrelu6 to the qclamp kernel instead. Test Plan: ``` // correctness python test/test_quantization.py TestQuantizedOps.test_qrelu6 // benchmarking import torch import torch.nn.functional as F toq = torch.ops.quantized import time N_WARMUP = 5 N_ITER = 1000 data = torch.randn(32, 32, 64, 64) data = torch.quantize_per_tensor(data, 0.05, 0, torch.quint8) for _ in range(N_WARMUP): F.hardtanh(data, 0., 6., inplace=True) t1 = time.time() for _ in range(N_ITER): F.hardtanh(data, 0., 6., inplace=True) t2 = time.time() for _ in range(N_WARMUP): toq.relu6(data, inplace=True) t3 = time.time() for _ in range(N_ITER): toq.relu6(data, inplace=True) t4 = time.time() t_hardtanh = t2 - t1 t_qrelu6 = t4 - t3 print(t_hardtanh, t_qrelu6) // before 0.7156341075897217 1.4007949829101562 // after 0.6825599670410156 0.6571671962738037 ``` Reviewed By: jerryzh168 Differential Revision: D32463754 Pulled By: vkuzo fbshipit-source-id: a87fe5907d7b71d87eb1d5f6588cd509a88f2969
Author
Parents
Loading