pytorch
fb55f12c - [cpu][inductor] improve cpu vec implementations of cos & sin (#94577)

Commit
1 year ago
[cpu][inductor] improve cpu vec implementations of cos & sin (#94577) The current Torchinductor's `cos` & `sin` implementations will call `sleef` functions in `aten::Vec` which show worse performance than Aten's `cos` & `sin` implementations that invoke `MKL` functions. The reason is that the `sleef` algorithms sacrifice performance in order to have a higher precision. This PR changes Torchinductor's `cos` & `sin` implementations from the `sleef` functions with `1.0` ULP error bound to the ones with `3.5` ULP error bound. **Performance data for eager v.s. inductor:** <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link=blue vlink=purple> suite=huggingface |   |   |   |  |   -- | -- | -- | -- | -- | -- op | improved_ratio | speedup_old | RSD(3) | speedup_new | RSD(3) cos | 62.12% | 0.653826147 | 4.48% | 1.059999006 | 3.38% sin | 38.12% | 0.745482927 | 0.72% | 1.029642026 | 5.33% </body> </html> **Accuracy data for eager v.s. inductor:** Each tol has been tested for 1000 times. <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/xuanliao/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link=blue vlink=purple> error_bound | tol=1e-7 | tol=1e-8 -- | -- | -- 1.0 ULP | PASS | FAIL 3.5 ULP | PASS | FAIL </body> </html> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94577 Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/Chillee, https://github.com/desertfire, https://github.com/jansel
Author
Committer
Parents
Loading