pytorch
c30048fc - add BFloat16 support for topk on CPU (#59547)

Commit
3 years ago
add BFloat16 support for topk on CPU (#59547) Summary: Added BFloat16 support for topk on CPU, and collected the benchmark data of topk for BFloat16 and Float32 data type by using the operator_benchmark tool of PyTorch on the platform of Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Input: 512x512, 512x1024, 1024x512, 1024x1024 K: 5 Number of cores: 1 core, 28 cores(1 socket) For 1 core: ---------------------------------------- PyTorch/Caffe2 Operator Micro-benchmarks ---------------------------------------- Tag : all Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W512_k5_dtypetorch.float32_cpu Input: H: 512, W: 512, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 911.401 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W512_k5_dtypetorch.bfloat16_cpu Input: H: 512, W: 512, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 911.700 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W1024_k5_dtypetorch.float32_cpu Input: H: 512, W: 1024, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 1506.927 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W1024_k5_dtypetorch.bfloat16_cpu Input: H: 512, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 1492.036 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W512_k5_dtypetorch.float32_cpu Input: H: 1024, W: 512, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 1825.634 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W512_k5_dtypetorch.bfloat16_cpu Input: H: 1024, W: 512, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 1819.872 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W1024_k5_dtypetorch.float32_cpu Input: H: 1024, W: 1024, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 3001.459 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W1024_k5_dtypetorch.bfloat16_cpu Input: H: 1024, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 2970.718 For 28 cores(1 socket): ---------------------------------------- PyTorch/Caffe2 Operator Micro-benchmarks ---------------------------------------- Tag : all Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W512_k5_dtypetorch.float32_cpu Input: H: 512, W: 512, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 146.995 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W512_k5_dtypetorch.bfloat16_cpu Input: H: 512, W: 512, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 123.423 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W1024_k5_dtypetorch.float32_cpu Input: H: 512, W: 1024, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 105.967 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W1024_k5_dtypetorch.bfloat16_cpu Input: H: 512, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 101.498 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W512_k5_dtypetorch.float32_cpu Input: H: 1024, W: 512, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 128.023 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W512_k5_dtypetorch.bfloat16_cpu Input: H: 1024, W: 512, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 125.172 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W1024_k5_dtypetorch.float32_cpu Input: H: 1024, W: 1024, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 129.855 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W1024_k5_dtypetorch.bfloat16_cpu Input: H: 1024, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 124.556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59547 Reviewed By: mrshenli Differential Revision: D29763916 Pulled By: ezyang fbshipit-source-id: 706c7d4349ac9ebd5d63f4844fca70febcb67023
Author
Parents
Loading