Add bfloat16 support for lerp on CPU (#84327)
### Description
Add bfloat16 support for lerp on CPU
### Testing
single core:
<html>
<body>
<!--StartFragment-->
op | shape |fp32 forward/ms|bf16 forward/s|fb32 backward/s| bf16 backward/s
-- | -- | -- | -- | -- | --
lerp (tensor) | [10, 128, 10, 124] | 0.005489 | 0.000613 | 0.006658 | 0.003385
| [10, 128, 20, 124] | 0.011057 | 0.001204 | 0.016032 | 0.007869
| [10, 128, 30, 124] | 0.016691 | 0.001954 | 0.025549 | 0.012823
| | | | |
lerp (scalar) | [10, 128, 10, 124] | 0.001096 | 0.000507 | 0.002024 | 0.001479
| [10, 128, 20, 124] | 0.00247 | 0.000997 | 0.005468 | 0.002907
| [10, 128, 30, 124] | 0.004178 | 0.001513 | 0.009775 | 0.004859
<!--EndFragment-->
</body>
</html>
single socket (28cores):
<html>
<body>
<!--StartFragment-->
op | shape | fp32 forward/s| bf16 forward/s| fb32backward/s| bf16 backward/s
-- | -- | -- | -- | -- | --
lerp (tensor) | [10, 128, 10, 124] | 0.000236 | 3.93E-05 | 0.000494 | 0.000235
| [10, 128, 20, 124] | 0.000525 | 7.39E-05 | 0.002485 | 0.000638
| [10, 128, 30, 124] | 0.000801 | 0.000121 | 0.004235 | 0.001529
| | | | |
lerp (scalar) | [10, 128, 10, 124] | 5.90E-05 | 3.32E-05 | 0.000129 | 0.000116
| [10, 128, 20, 124] | 0.000155 | 5.87E-05 | 0.000368 | 0.000206
| [10, 128, 30, 124] | 0.000324 | 9.04E-05 | 0.001322 | 0.000313
<!--EndFragment-->
</body>
</html>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84327
Approved by: https://github.com/frank-wei