pytorch
0cd6ebd7 - optimize replication padding performance on CPU (#102255)

Commit
2 years ago
optimize replication padding performance on CPU (#102255) The major difference from the previous PR on ReflectionPad is the padding indexing struct, `ReplicationPad::index()`, the rest of the part is pretty much the same. The following benchmark result gathered on Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, with 20 cores per socket. ### single core inference ``` (before) ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.265 ms; ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 52.336 ms; (after) ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.048 ms; ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 17.199 ms; ``` ### single socket inference ``` (before) ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.111 ms; ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 3.885 ms; (after) ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([1, 3, 224, 224]) , NCHW: 0.011 ms; ReplicationPad2d((2, 2, 2, 2)) size: torch.Size([128, 64, 56, 56]) , NCHW: 3.148 ms; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102255 Approved by: https://github.com/cpuhrsch
Author
Committer
Parents
Loading