onnxruntime
qlinaradd for arm/sse2/avx2 using intrinsic, enable binary broadcasting parallel
#4216
Merged

qlinaradd for arm/sse2/avx2 using intrinsic, enable binary broadcasting parallel #4216

zhanghuanrong merged 38 commits into master from zhalei/qladd_parallel_arm
zhanghuanrong
zhanghuanrong zhanghuanrong requested a review 5 years ago
yufenglee
yufenglee commented on 2020-06-16
zhangleihuanrong Support quantization linear binary element wise math ops, implement Q…
eee48667
zhanghuanrong Modify according to PR feedbacks. Mainly:
faabbe85
zhanghuanrong Utilize MlasSubtractInt32x4 in MlasDequantizeLinearVector().
ba681203
zhanghuanrong Some format fix.
6b0a3ff8
zhanghuanrong More nature parallel parameter type.
0b338cc6
zhanghuanrong Fix build break for x86.
90f57521
zhanghuanrong Comment goes to 80 before wrap.
1678c03a
zhanghuanrong Many change on assembly on Marco related.
e1f277a8
zhanghuanrong Using CLang Format to format the file.
6922d0e0
zhanghuanrong Fix arm32 build error.
7e709e90
zhanghuanrong Remove some duplicate in different #if defined
1058e12a
zhanghuanrong working add.u8.vector to vector
a74f8505
zhanghuanrong Fix runtime bus error on real arm32 linux.
39af7d12
zhanghuanrong fix typo in store last one lane.
ed353171
zhanghuanrong arm32 qlinearadd handle scalar.
adaec558
zhanghuanrong Move qladd to seperate c++ file
49cbf02f
zhanghuanrong Add neon64 qladd.
08e1e5d1
zhanghuanrong refactor some, enhance two instructions on arm64 only instructions
4cf229d2
zhanghuanrong Fix typo for arm64
8288a432
zhanghuanrong use strict op in pure c++ (min/max on float value)
0fbc3f4d
zhanghuanrong sse2 new version.
3ccd899e
zhanghuanrong mrege arm/sse2/avx2
ba664cfc
zhanghuanrong pass arm/sse/avx2 linux test
9a949b5b
zhanghuanrong remove non-used assembly file.
2487f07c
zhanghuanrong Remove unused data definition and tailing spaces.
8251f96a
zhanghuanrong Fix broadcasting parallel issue.
5c5aadb2
zhanghuanrong Enhance broadcasting scenarios. Allow testing result diff due to round
71a09b85
zhanghuanrong Add Mlas or MLAS_ prefix for namespace safety.
72a6fc3d
zhanghuanrong Handle alignment issue for arm32 for GCC/MSVC. remove some unused
860964f3
zhanghuanrong Specify /arch:AVX2 for qladd_avx2.cpp
0cec23eb
zhanghuanrong Fix type during copy/paste when unrolling. Better one GreatEqual
4a9d1ec6
zhanghuanrong Arm neon alignment parameter is bits rather than bytes, change it.
c0df4a87
zhanghuanrong zhanghuanrong force pushed from ba266790 to c0df4a87 5 years ago
zhanghuanrong Move qladd_avx2.cpp to intrinsics/avx2/ folder
983a8d69
zhanghuanrong Formatting using mlas style.
cf0adbdd
zhanghuanrong Double check mlas style for these files.
2c12538b
zhanghuanrong change indent 2 to 4 for qladd_avx2.cpp
d0406502
zhanghuanrong Fix windows x86 build error due to sse2 no _mm_cvtsi128_si64
14070e81
zhanghuanrong To re-trigger all as old failed pipeline updated.
2fac863a
tracysh tracysh requested a review from tracysh tracysh 5 years ago
tracysh
tracysh approved these changes on 2020-07-01
zhanghuanrong zhanghuanrong merged 94c98aa0 into master 5 years ago
zhanghuanrong zhanghuanrong deleted the zhalei/qladd_parallel_arm branch 5 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone