onnxruntime
ROCm EP for AMD GPU
#5480
Merged

ROCm EP for AMD GPU #5480

weixingzhang merged 126 commits into master from wezhan/rocm
weixingzhang
weixingzhang weixingzhang requested a review from BowenBao BowenBao 5 years ago
weixingzhang weixingzhang requested a review from liqunfu liqunfu 5 years ago
weixingzhang weixingzhang requested a review from spandantiwari spandantiwari 5 years ago
weixingzhang weixingzhang requested a review from thiagocrepaldi thiagocrepaldi 5 years ago
weixingzhang weixingzhang requested a review 5 years ago
weixingzhang weixingzhang force pushed 5 years ago
weixingzhang weixingzhang added core runtime
weixingzhang weixingzhang added training
weixingzhang weixingzhang assigned suffiank suffiank 5 years ago
weixingzhang weixingzhang assigned SherlockNoMad SherlockNoMad 5 years ago
weixingzhang weixingzhang changed the title [Draft] ROCm EP for AMD GPU ROCm EP for AMD GPU 5 years ago
weixingzhang weixingzhang assigned jessebenson jessebenson 5 years ago
weixingzhang weixingzhang requested a review from mrry mrry 5 years ago
weixingzhang weixingzhang assigned mrry mrry 5 years ago
weixingzhang weixingzhang force pushed to 2da698c2 5 years ago
weixingzhang weixingzhang force pushed from 2da698c2 5 years ago
weixingzhang weixingzhang force pushed to d5aa7870 5 years ago
weixingzhang weixingzhang force pushed to 75ee0c31 5 years ago
snnn
snnn commented on 2020-10-15
suffiank
suffiank
suffiank commented on 2020-10-15
suffiank
suffiank commented on 2020-10-15
suffiank
suffiank commented on 2020-10-15
suffiank
suffiank commented on 2020-10-16
suffiank
suffiank
suffiank commented on 2020-10-16
suffiank
suffiank commented on 2020-10-16
suffiank
suffiank commented on 2020-10-16
suffiank
suffiank commented on 2020-10-16
weixingzhang weixingzhang force pushed from c3155420 5 years ago
suffiank
suffiank commented on 2020-10-16
suffiank
suffiank commented on 2020-10-16
suffiank
suffiank commented on 2020-10-16
weixingzhang weixingzhang force pushed to dcf72194 5 years ago
weixingzhang weixingzhang force pushed from bd41e785 to 801dfe2c 5 years ago
suffiank
suffiank commented on 2020-10-23
suffiank
suffiank commented on 2020-10-23
weixingzhang init commit
f0973689
weixingzhang keep using CUDA/CUDA_PINNED.
12bb2d0a
weixingzhang add training code
7052f4ae
weixingzhang inference
c9f393d0
weixingzhang training
c9929cda
weixingzhang add contrib ops
c83e9043
weixingzhang change EP type for op 10 & 11
ad51d342
weixingzhang delete files.
6225147a
weixingzhang add layernorm, fix slice
b1c88fff
weixingzhang delete hipify files
3dd58a63
weixingzhang add dropout, but fp16 doesn't work
659c56cc
weixingzhang add softmax and softmax grad
da3bbc3d
weixingzhang add SparseSoftmaxCrossEntropy
dac15ba3
weixingzhang Add lamb, but fp16 is disabled.
74fab112
weixingzhang add miopen and two more softmax tests below pass.
5f2080bb
weixingzhang use correct __shfl_* functions
38448599
weixingzhang fix softmax
677ea963
weixingzhang fix layer_norm
03f93a62
weixingzhang update SparseSoftmax to latest cuda version
8dc6bef1
weixingzhang fix layernorm version
2f1bca51
fix build issue on rocm3.1
e53215d7
more changes for fp16
c6730066
special hanlding in ReduceSum for case input/output size=1
a004e808
use hip for bert trainig
698e71bd
only build gfx906 for now to save building time
25c29fb6
fp16
8bd3ae9f
update TArray
c26f447f
using global memory for transpose kernel parameters
5647da1c
using global memory for kernel parameters
406c14c2
sync with ort_training
42e75fc6
Add dropout and lamb
fa0e19a9
fp16 atomicAdd
a99f9363
add uint24 math
08d2973d
add hipify script
b9084a17
sync with ort_training
2253573f
disable uint24
78f17fbc
disable pre-opset 10 ops and post-opset 10 ops
9a497311
enable rccl
43f6b615
revert to use USE_NCCL since it is used in some backward graph build …
463ddc48
fix build error
715279d4
enable mpi_context
b33faee0
fix build error
c55c776a
set correct device id
cd31cc59
disable mpi build
f1c03a70
re-enable mpi with building options.
96e7f72f
fix a rebase issue.
df7f128e
sync master
a2f8e229
can build without mpi
066e49b5
uncoment -g build option
43c5cc36
minor clean
7d66d374
pengwa add e2e throughput
c9fa021d
add Dockerfile
cae55163
polish run_perf.sh
f6612dcf
update Dockerfile
498fdf2e
update Dockerfile
bf3a81f0
update Dockerfile and run_perf.sh
aa5694d7
sabreshao Fix hipLaunchKernelGGL compilation fails. (#4121)
23b0f3f7
sabreshao Fix build failure on ROCm 3.5 by removing reference of compiler-rt. (…
e2842cbc
update to use ROCm3.5 docker image
051188e6
polish Dockerfile
34f7a712
more change to Dockerfile
9e8768cb
add hostfile
6b22aeed
minor change
06533c2c
add dockerfile for rocm3.3
5632e6df
dockerfile for rocm3.5
2026cca4
update Dockerfile.rocm3.5
aae458c6
update dockerfile for rocm3.5 with IB toolkit
362b2f1a
remove --update
fd8065c1
use openmpi 4.0.4
cc03119a
remove openmmpi build
135b0d17
use MPI_THREAD_FUNNELED
a0261307
disable warnings
de8860b5
update to master
f0e7e01d
1. Support PT frontend
657993d4
Revert "Replace loss function in BERT_LOSS with SoftmaxCrossEntropyLo…
46cd8fde
sync to latest master.
cea7c05a
add gfx908
db2a8b8d
Revert "Revert "Replace loss function in BERT_LOSS with SoftmaxCrossE…
a8485036
fix SoftmaxCrossEntropyLoss
40eda7d7
update unit tests by leveraging CUDA unit tests
890dfc59
fix the issue to increase bigger batch size
32156fb5
anghostcici fix the compilation issue with change in hip (#4710)
bf110892
fix training loss issue exposed by using PT frontend
b0d33dad
don't compile RCCL kernels when RCCL is disabled.
cc535fd5
fix build issue with Clang
245380bf
change back to use MPI_THREAD_MULTIPLE
21f18f78
enable FP16 for Adam
8d0f99a4
remove __HIP_ARCH__ which was converted from __CUDA_ARCH__ and not su…
f96e9de6
using GPU_WARP_SIZE
258e456c
build lamb test
c661d2b4
improve docker file
5e58ef41
disable multi-tensor apply for lamb
c912a0f4
add ORT training examples in dockerfile.
28358bab
Add gpt2 fine tuning and Sync upto master 370d194db
d94bd1f9
enable split and where op for GPT2
dcae2c9e
add docker file for ROCm3.8
8a740f2e
leverage XDLOPS on MI100 for FP32
97c135ec
enable XDLOPS on MI100 for FP16
42b4587b
using rocBLAS instead of hipBLAS
d8803147
add model folder for docker build
3707457c
add rocprof.py
f29a118e
Merge CI pipeline into AMD GPU master (#5425)
c901ed85
rename hip to rocm
90dffe23
rename folder name from hip to rocm
bc0ac4aa
change header file path from hip to rocm
ea1b5ae8
rename file names from hip_xxx to rocm_xxx
ace065d5
minor change in CMakeLists.txt for rccl
61df8307
sync to master up to cd0386b6497
9e652e59
sync to master up to 0cb09374c68
e533f4a1
sync master up to 417929b0
e7cd00c9
sync master up to 80d36eab
6ba24a92
sync master up to 915d475353
9a37d261
remove code which was added for debugging
d82c9eaf
address comments of code review and fix build failures from python fi…
91a6eb69
update rocm contrib_ops by using amd_hipify.py
3f1c5d5a
update rocm core_ops by using amd_hipify.py
19f77362
update rocm training_ops by using amd_hipify.py
0839d135
enable amd_hipify.py in build.py
454854bf
clean more files and move dockerfiles for AMD GPU to the folder amdgpu
bdf7891a
rename HIPAllocator/HipAsyncBuffer/HIPFence to ROCMAllocator/RocmAsyn…
a38fea6e
remove hipblas code since rocblas is used.
936b09b1
weixingzhang weixingzhang force pushed 5 years ago
rename HIPMPinnedAllocator to ROCMPinnedAllocator and clean code
6fc75887
move rocm kernels to a seperate PR.
3a997e7a
weixingzhang weixingzhang force pushed to 3a997e7a 5 years ago
mrry
mrry commented on 2020-10-27
address code review comments
6bd8b64d
mrry
weixingzhang Merge branch 'master' into wezhan/rocm
85a0e1e6
weixingzhang
mrry
mrry approved these changes on 2020-10-29
weixingzhang weixingzhang merged aec4cb48 into master 5 years ago
weixingzhang weixingzhang deleted the wezhan/rocm branch 5 years ago

Login to write a write a comment.

Login via GitHub

Labels
Milestone