onnxruntime
ROCm EP for AMD GPU
#5480
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
126
Changes
View On
GitHub
ROCm EP for AMD GPU
#5480
weixingzhang
merged 126 commits into
master
from
wezhan/rocm
weixingzhang
requested a review
from
BowenBao
5 years ago
weixingzhang
requested a review
from
liqunfu
5 years ago
weixingzhang
requested a review
from
spandantiwari
5 years ago
weixingzhang
requested a review
from
thiagocrepaldi
5 years ago
weixingzhang
requested a review
5 years ago
weixingzhang
force pushed
5 years ago
weixingzhang
added
core runtime
weixingzhang
added
training
weixingzhang
assigned
suffiank
5 years ago
weixingzhang
assigned
SherlockNoMad
5 years ago
weixingzhang
changed the title
[Draft] ROCm EP for AMD GPU
ROCm EP for AMD GPU
5 years ago
weixingzhang
assigned
jessebenson
5 years ago
weixingzhang
requested a review
from
mrry
5 years ago
weixingzhang
assigned
mrry
5 years ago
weixingzhang
force pushed
to
2da698c2
5 years ago
weixingzhang
force pushed
from
2da698c2
5 years ago
weixingzhang
force pushed
to
d5aa7870
5 years ago
weixingzhang
force pushed
to
75ee0c31
5 years ago
snnn
commented on 2020-10-15
suffiank
commented on 2020-10-15
suffiank
commented on 2020-10-15
suffiank
commented on 2020-10-15
suffiank
commented on 2020-10-16
suffiank
commented on 2020-10-16
suffiank
commented on 2020-10-16
suffiank
commented on 2020-10-16
suffiank
commented on 2020-10-16
weixingzhang
force pushed
from
c3155420
5 years ago
suffiank
commented on 2020-10-16
suffiank
commented on 2020-10-16
suffiank
commented on 2020-10-16
weixingzhang
force pushed
to
dcf72194
5 years ago
weixingzhang
force pushed
from
bd41e785
to
801dfe2c
5 years ago
suffiank
commented on 2020-10-23
suffiank
commented on 2020-10-23
init commit
f0973689
keep using CUDA/CUDA_PINNED.
12bb2d0a
add training code
7052f4ae
inference
c9f393d0
training
c9929cda
add contrib ops
c83e9043
change EP type for op 10 & 11
ad51d342
delete files.
6225147a
add layernorm, fix slice
b1c88fff
delete hipify files
3dd58a63
add dropout, but fp16 doesn't work
659c56cc
add softmax and softmax grad
da3bbc3d
add SparseSoftmaxCrossEntropy
dac15ba3
Add lamb, but fp16 is disabled.
74fab112
add miopen and two more softmax tests below pass.
5f2080bb
use correct __shfl_* functions
38448599
fix softmax
677ea963
fix layer_norm
03f93a62
update SparseSoftmax to latest cuda version
8dc6bef1
fix layernorm version
2f1bca51
fix build issue on rocm3.1
e53215d7
more changes for fp16
c6730066
special hanlding in ReduceSum for case input/output size=1
a004e808
use hip for bert trainig
698e71bd
only build gfx906 for now to save building time
25c29fb6
fp16
8bd3ae9f
update TArray
c26f447f
using global memory for transpose kernel parameters
5647da1c
using global memory for kernel parameters
406c14c2
sync with ort_training
42e75fc6
Add dropout and lamb
fa0e19a9
fp16 atomicAdd
a99f9363
add uint24 math
08d2973d
add hipify script
b9084a17
sync with ort_training
2253573f
disable uint24
78f17fbc
disable pre-opset 10 ops and post-opset 10 ops
9a497311
enable rccl
43f6b615
revert to use USE_NCCL since it is used in some backward graph build …
463ddc48
fix build error
715279d4
enable mpi_context
b33faee0
fix build error
c55c776a
set correct device id
cd31cc59
disable mpi build
f1c03a70
re-enable mpi with building options.
96e7f72f
fix a rebase issue.
df7f128e
sync master
a2f8e229
can build without mpi
066e49b5
uncoment -g build option
43c5cc36
minor clean
7d66d374
add e2e throughput
c9fa021d
add Dockerfile
cae55163
polish run_perf.sh
f6612dcf
update Dockerfile
498fdf2e
update Dockerfile
bf3a81f0
update Dockerfile and run_perf.sh
aa5694d7
Fix hipLaunchKernelGGL compilation fails. (#4121)
23b0f3f7
Fix build failure on ROCm 3.5 by removing reference of compiler-rt. (…
e2842cbc
update to use ROCm3.5 docker image
051188e6
polish Dockerfile
34f7a712
more change to Dockerfile
9e8768cb
add hostfile
6b22aeed
minor change
06533c2c
add dockerfile for rocm3.3
5632e6df
dockerfile for rocm3.5
2026cca4
update Dockerfile.rocm3.5
aae458c6
update dockerfile for rocm3.5 with IB toolkit
362b2f1a
remove --update
fd8065c1
use openmpi 4.0.4
cc03119a
remove openmmpi build
135b0d17
use MPI_THREAD_FUNNELED
a0261307
disable warnings
de8860b5
update to master
f0e7e01d
1. Support PT frontend
657993d4
Revert "Replace loss function in BERT_LOSS with SoftmaxCrossEntropyLo…
46cd8fde
sync to latest master.
cea7c05a
add gfx908
db2a8b8d
Revert "Revert "Replace loss function in BERT_LOSS with SoftmaxCrossE…
a8485036
fix SoftmaxCrossEntropyLoss
40eda7d7
update unit tests by leveraging CUDA unit tests
890dfc59
fix the issue to increase bigger batch size
32156fb5
fix the compilation issue with change in hip (#4710)
bf110892
fix training loss issue exposed by using PT frontend
b0d33dad
don't compile RCCL kernels when RCCL is disabled.
cc535fd5
fix build issue with Clang
245380bf
change back to use MPI_THREAD_MULTIPLE
21f18f78
enable FP16 for Adam
8d0f99a4
remove __HIP_ARCH__ which was converted from __CUDA_ARCH__ and not su…
f96e9de6
using GPU_WARP_SIZE
258e456c
build lamb test
c661d2b4
improve docker file
5e58ef41
disable multi-tensor apply for lamb
c912a0f4
add ORT training examples in dockerfile.
28358bab
Add gpt2 fine tuning and Sync upto master 370d194db
d94bd1f9
enable split and where op for GPT2
dcae2c9e
add docker file for ROCm3.8
8a740f2e
leverage XDLOPS on MI100 for FP32
97c135ec
enable XDLOPS on MI100 for FP16
42b4587b
using rocBLAS instead of hipBLAS
d8803147
add model folder for docker build
3707457c
add rocprof.py
f29a118e
Merge CI pipeline into AMD GPU master (#5425)
c901ed85
rename hip to rocm
90dffe23
rename folder name from hip to rocm
bc0ac4aa
change header file path from hip to rocm
ea1b5ae8
rename file names from hip_xxx to rocm_xxx
ace065d5
minor change in CMakeLists.txt for rccl
61df8307
sync to master up to cd0386b6497
9e652e59
sync to master up to 0cb09374c68
e533f4a1
sync master up to 417929b0
e7cd00c9
sync master up to 80d36eab
6ba24a92
sync master up to 915d475353
9a37d261
remove code which was added for debugging
d82c9eaf
address comments of code review and fix build failures from python fi…
91a6eb69
update rocm contrib_ops by using amd_hipify.py
3f1c5d5a
update rocm core_ops by using amd_hipify.py
19f77362
update rocm training_ops by using amd_hipify.py
0839d135
enable amd_hipify.py in build.py
454854bf
clean more files and move dockerfiles for AMD GPU to the folder amdgpu
bdf7891a
rename HIPAllocator/HipAsyncBuffer/HIPFence to ROCMAllocator/RocmAsyn…
a38fea6e
remove hipblas code since rocblas is used.
936b09b1
weixingzhang
force pushed
5 years ago
rename HIPMPinnedAllocator to ROCMPinnedAllocator and clean code
6fc75887
move rocm kernels to a seperate PR.
3a997e7a
weixingzhang
force pushed
to
3a997e7a
5 years ago
mrry
commented on 2020-10-27
address code review comments
6bd8b64d
Merge branch 'master' into wezhan/rocm
85a0e1e6
mrry
approved these changes on 2020-10-29
weixingzhang
merged
aec4cb48
into master
5 years ago
weixingzhang
deleted the wezhan/rocm branch
5 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
mrry
suffiank
snnn
BowenBao
liqunfu
spandantiwari
thiagocrepaldi
Assignees
mrry
suffiank
jessebenson
SherlockNoMad
Labels
training
core runtime
Milestone
No milestone
Login to write a write a comment.
Login via GitHub