Optimize torch zeros (#45636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45636
After creating empty tensor 'memset' used to zero out items of tensor
Test Plan:
pytorch benchmark tool results:
timer = benchmark_utils.Timer(stmt="torch.zeros((1024, 4096))")
Before: 1007 us
After: 841.26 us
1 measurement, 10000 runs , 1 thread
timer = benchmark_utils.Timer(stmt="torch.zeros((128))")
Before: 4 - 7.6 us
After: 2.4 - 2.8 us
1 measurement, 10000 runs , 1 thread
torch.int8 | 1 | 4096 | 8192 | 16384 | 32768 |
1 threads: --------------------------------------------------------------
(PR #45636) x.zero_() | 500 | 500 | 600 | 700 | 2000 |
(Reference) x.zero_() | 800 | 1000 | 1000 | 2000 | 2000 |
2 threads: --------------------------------------------------------------
(PR #45636) x.zero_() | 500 | 500 | 600 | 700 | 2000 |
(Reference) x.zero_() | 800 | 1000 | 1000 | 2000 | 3000 |
4 threads: --------------------------------------------------------------
(PR #45636) x.zero_() | 500 | 500 | 600 | 700 | 2000 |
(Reference) x.zero_() | 800 | 1000 | 1000 | 2000 | 3000 |
torch.int32 | 1 | 4096 | 8192 | 16384 | 32768 |
1 threads: --------------------------------------------------------------
(PR #45636) x.zero_() | 400 | 700 | 2000 | 2900 | 5500 |
(Reference) x.zero_() | 800 | 2000 | 3000 | 4400 | 7300 |
2 threads: --------------------------------------------------------------
(PR #45636) x.zero_() | 500 | 700 | 2000 | 3000 | 5600 |
(Reference) x.zero_() | 900 | 2000 | 2000 | 3600 | 7200 |
4 threads: --------------------------------------------------------------
(PR #45636) x.zero_() | 400 | 700 | 2000 | 3000 | 5700 |
(Reference) x.zero_() | 800 | 2000 | 3100 | 4300 | 9000 |
torch.float16 | 1 | 4096 | 8192 | 16384 | 32768 |
1 threads: --------------------------------------------------------------
(PR #45636) x.zero_() | 500 | 500 | 700 | 2000 | 3000 |
(Reference) x.zero_() | 800 | 1000 | 2000 | 2000 | 3300 |
2 threads: --------------------------------------------------------------
(PR #45636) x.zero_() | 500 | 600 | 700 | 2000 | 3000 |
(Reference) x.zero_() | 800 | 1000 | 2000 | 2000 | 4300 |
4 threads: --------------------------------------------------------------
(PR #45636) x.zero_() | 500 | 600 | 700 | 2000 | 3300 |
(Reference) x.zero_() | 900 | 1000 | 2000 | 2000 | 4400 |
torch.float32 | 1 | 4096 | 8192 | 16384 | 32768 |
1 threads: --------------------------------------------------------------
(PR #45636) x.zero_() | 500 | 700 | 2000 | 3200 | 6100 |
(Reference) x.zero_() | 800 | 2000 | 2000 | 3500 | 6100 |
2 threads: --------------------------------------------------------------
(PR #45636) x.zero_() | 500 | 700 | 2000 | 3100 | 5600 |
(Reference) x.zero_() | 800 | 2000 | 2000 | 3300 | 7000 |
4 threads: --------------------------------------------------------------
(PR #45636) x.zero_() | 500 | 700 | 2000 | 3000 | 5600 |
(Reference) x.zero_() | 900 | 2000 | 2000 | 3600 | 7500 |
Reviewed By: ngimel
Differential Revision: D23925113
fbshipit-source-id: 04e97ff6d67c52a8e7a21449113e1a0a7443098f