onnxruntime
Improve performance of CUDA implementations for GatherElements and Greater, Equal and Less
#4989
Merged

Improve performance of CUDA implementations for GatherElements and Greater, Equal and Less #4989

yuslepukhin merged 8 commits into master from yuslepukhin/hummingbird
yuslepukhin
yuslepukhin Start with a base line by refctoring templates only.
df4370f7
yuslepukhin Settle on thread_work_size = 8 so we do not loose too much parallelism.
3a532c85
yuslepukhin Refactor Gather
dcb4855a
yuslepukhin xx
2c666c9b
yuslepukhin Add check for remain > 0 to skip a lot of divisions.
6b69deca
yuslepukhin Merge branch 'master' into yuslepukhin/hummingbird
1dfe5856
yuslepukhin Fix initialization
120d7559
yuslepukhin Optimize Binary CompareFunction and remove Impl_Cast invocation.
1bb6ec4d
yuslepukhin yuslepukhin requested a review from skottmckay skottmckay 5 years ago
yuslepukhin yuslepukhin requested a review from pranavsharma pranavsharma 5 years ago
yuslepukhin yuslepukhin requested a review from hariharans29 hariharans29 5 years ago
yuslepukhin yuslepukhin requested a review from ke1337 ke1337 5 years ago
yuslepukhin yuslepukhin requested a review 5 years ago
hariharans29
hariharans29 commented on 2020-09-01
snnn
snnn approved these changes on 2020-09-02
yuslepukhin yuslepukhin merged e1901a7e into master 5 years ago
yuslepukhin yuslepukhin deleted the yuslepukhin/hummingbird branch 5 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone