pytorch
3beccbc2 - Add BFloat16 support and optimization for mish, hardtanh backward, and silu on CPU (#82460)

Commit
2 years ago
Add BFloat16 support and optimization for mish, hardtanh backward, and silu on CPU (#82460) ### Description * add BFloat16 support for mish and hardtanh backward on CPU. * optimize the performance for silu ### Testing - optimize the performance for silu: bfloat16 single socket (28 cores): ``` before: 1x128x1024 forward 0.090 s backward 0.218 s 10x128x1024 forward 0.146 s backward 0.314 s after: 1x128x1024 forward 0.064 s backward 0.100 s 10x128x1024 forward 0.085 s backward 0.133 s ``` single core: ``` before: 1x128x1024 forward 0.300 s backward 0.606 s 10x128x1024 forward 2.825 s backward 5.834 s after: 1x128x1024 forward 0.156 s backward 0.239 s 10x128x1024 forward 1.447 s backward 2.165 s ``` - Add BFloat16 support for mish and backward of hardtanh on CPU. single socket (20 cores): op | shape | fp32 / s | fp32 / s | bf16 / s |  bf16 / s -- | -- | -- | -- | -- | --   |   | forward | backward | forward | backward silu | [10, 128, 10, 10] | 4.41E-05 | 7.67E-05 | 5.32E-05 | 9.38E-05   | [10, 128, 80, 80] | 0.0008 | 0.001788 | 0.00067 | 0.001031 mish | [10, 128, 10, 10] | 0.000356 | 0.000427 | 0.000367 | 0.000436   | [10, 128, 80, 80] | 0.004527 | 0.005807 | 0.004757 | 0.005393 hardtanh | [10, 128, 10, 10] | / | 3.97E-05 | / | 4.45E-05   | [10, 128, 80, 80] | / | 0.001748 | / | 0.000645 single core: op | shape | fp32 / s | fp32 / s | bf16 / s |  bf16 / s -- | -- | -- | -- | -- | --   |   | forward | backward | forward | backward silu | [10, 128, 10, 10] | 1.17E-04 | 1.91E-04 | 1.35E-04 | 2.23E-04   | [10, 128, 80, 80] | 0.007434 | 0.013141 | 0.008464 | 0.013044 mish | [10, 128, 10, 10] | 0.00103 | 0.00122 | 0.00106 | 0.001227   | [10, 128, 80, 80] | 0.065629 | 0.078418 | 0.067779 | 0.077214 hardtanh | [10, 128, 10, 10] | / | 1.18E-04 | / | 9.30E-05   | [10, 128, 80, 80] | / | 0.010773 | / | 0.005834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82460 Approved by: https://github.com/mingfeima, https://github.com/malfet
Author
Committer
Parents
Loading