Update Sleef to include fix for FMA4 detection (#20450)
Summary:
FMA4 support is in bit 16 of register ECX, not EDX of the "extended
processor info" (0x80000001).
Once we verify that this change fixes https://github.com/pytorch/pytorch/issues/12112, I'll make a PR for upstream Sleef.
The mapping of registers to reg is:
```
reg[0] = eax
reg[1] = ebx
reg[2] = ecx <---
reg[3] = edx
```
Bit 16 of EDX is PAT (Page Attribute Table) on AMD CPUs, which is widely
supported. Intel CPUs do not set this bit. This causes "Illegal
instruction"
errors on AMD CPUs that do not support FMA4.
See https://github.com/pytorch/pytorch/issues/12112
See https://github.com/shibatch/sleef/issues/261
http://developer.amd.com/wordpress/media/2012/10/254811.pdf (Page 20)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20450
Differential Revision: D15324405
Pulled By: colesbury
fbshipit-source-id: 96fb344c646998ff5da19e4cdbf493f5a4e9892a