Optimize CUDA Sum op kernel and refactor CUDA elementwise variadic input op kernels #4418
Initial implementation.
e4980782
Fixes, clean up, test.
dd4fe4f3
Clean up kernel for loops, use binary impl for 2 input and no broadca…
5a0731ca
Address comments.
0f57ae74
Fix warning.
5bab69ad
Optimize Sum kernel. Use local variable to store intermediate output …
04f27de6
HectorSVC
dismissed these changes
on 2020-07-09
wschin
commented
on 2020-07-09
wschin
dismissed these changes
on 2020-07-09
Address PR comment.
24aa8dea
edgchen1
dismissed their stale review
via 24aa8dea
5 years ago
edgchen1
dismissed their stale review
via 24aa8dea
5 years ago
Address PR comments.
cd70f833
edgchen1
merged
6c7da5e9
into master 5 years ago
edgchen1
deleted the edgchen1/sum_optimization branch 5 years ago
Assignees
No one assigned
Labels
training
core runtime
Login to write a write a comment.
Login via GitHub