[cpu][inductor] improve cpu vec implementations of cos & sin (#94577)
The current Torchinductor's `cos` & `sin` implementations will call `sleef` functions in `aten::Vec` which show worse performance than Aten's `cos` & `sin` implementations that invoke `MKL` functions. The reason is that the `sleef` algorithms sacrifice performance in order to have a higher precision. This PR changes Torchinductor's `cos` & `sin` implementations from the `sleef` functions with `1.0` ULP error bound to the ones with `3.5` ULP error bound.
**Performance data for eager v.s. inductor:**
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
</head>
<body link=blue vlink=purple>
suite=huggingface | | | | |
-- | -- | -- | -- | -- | --
op | improved_ratio | speedup_old | RSD(3) | speedup_new | RSD(3)
cos | 62.12% | 0.653826147 | 4.48% | 1.059999006 | 3.38%
sin | 38.12% | 0.745482927 | 0.72% | 1.029642026 | 5.33%
</body>
</html>
**Accuracy data for eager v.s. inductor:**
Each tol has been tested for 1000 times.
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
</head>
<body link=blue vlink=purple>
error_bound | tol=1e-7 | tol=1e-8
-- | -- | --
1.0 ULP | PASS | FAIL
3.5 ULP | PASS | FAIL
</body>
</html>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94577
Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/Chillee, https://github.com/desertfire, https://github.com/jansel