onnxruntime
Improve performance of CUDA implementations for GatherElements and Greater, Equal and Less
#4989

Merged

Improve performance of CUDA implementations for GatherElements and Greater, Equal and Less #4989

yuslepukhin merged 8 commits into master from yuslepukhin/hummingbird

Start with a base line by refctoring templates only.

df4370f7

Settle on thread_work_size = 8 so we do not loose too much parallelism.

3a532c85

Refactor Gather

dcb4855a

2c666c9b

Add check for remain > 0 to skip a lot of divisions.

6b69deca

Merge branch 'master' into yuslepukhin/hummingbird

1dfe5856

Fix initialization

120d7559

Optimize Binary CompareFunction and remove Impl_Cast invocation.

1bb6ec4d

yuslepukhin requested a review from

skottmckay 5 years ago

yuslepukhin requested a review from

pranavsharma 5 years ago

yuslepukhin requested a review from

hariharans29 5 years ago

yuslepukhin requested a review from

ke1337 5 years ago

yuslepukhin requested a review 5 years ago

hariharans29 commented on 2020-09-01

snnn approved these changes on 2020-09-02

yuslepukhin merged e1901a7e into master 5 years ago

yuslepukhin deleted the yuslepukhin/hummingbird branch 5 years ago

Reviewers

snnn

hariharans29

skottmckay

pranavsharma

ke1337

Assignees

No one assigned

Labels

None yet

Milestone

No milestone