ibm z14/15 SIMD support (#66407)
Summary:
https://github.com/pytorch/pytorch/issues/66406
implemented z arch 14/15 vector SIMD additions.
so far besides bfloat all other types have their SIMD implementation.
it has 99% coverage and currently passing the local test.
it is concise and the main SIMD file is only one header file
it's using template metaprogramming, mostly. but still, there are a few macrosses left with the intention not to modify PyTorch much
Sleef supports z15
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66407
Reviewed By: mrshenli
Differential Revision: D33370163
Pulled By: malfet
fbshipit-source-id: 0e5a57f31b22a718cd2a9ac59753fb468cdda140