optimize elementwise sum (#17456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17456
Using an instruction sequence similar to function in fbgemm/src/QuantUtilAvx2.cc
elementwise_sum_benchmark added
Reviewed By: protonu
Differential Revision: D14205695
fbshipit-source-id: 84939c9d3551f123deec3baf7086c8d31fbc873e