PR #5480 ROCm EP for AMD GPU

weixingzhang requested a review from

BowenBao 5 years ago

weixingzhang requested a review from

liqunfu 5 years ago

weixingzhang requested a review from

spandantiwari 5 years ago

weixingzhang requested a review from

thiagocrepaldi 5 years ago

weixingzhang requested a review 5 years ago

weixingzhang force pushed 5 years ago

weixingzhang added core runtime

weixingzhang added training

weixingzhang assigned

suffiank 5 years ago

weixingzhang assigned

SherlockNoMad 5 years ago

weixingzhang changed the title ~~[Draft] ROCm EP for AMD GPU~~ ROCm EP for AMD GPU 5 years ago

weixingzhang assigned

jessebenson 5 years ago

weixingzhang requested a review from

mrry 5 years ago

weixingzhang assigned

mrry 5 years ago

weixingzhang force pushed to 2da698c2 5 years ago

weixingzhang force pushed from 2da698c2 5 years ago

weixingzhang force pushed to d5aa7870 5 years ago

weixingzhang force pushed to 75ee0c31 5 years ago

snnn commented on 2020-10-15

suffiank commented on 2020-10-15

suffiank commented on 2020-10-16

weixingzhang force pushed from c3155420 5 years ago

suffiank commented on 2020-10-16

weixingzhang force pushed to dcf72194 5 years ago

weixingzhang force pushed from bd41e785 to 801dfe2c 5 years ago

suffiank commented on 2020-10-23

init commit

f0973689

keep using CUDA/CUDA_PINNED.

12bb2d0a

add training code

7052f4ae

inference

c9f393d0

training

c9929cda

add contrib ops

c83e9043

change EP type for op 10 & 11

ad51d342

delete files.

6225147a

add layernorm, fix slice

b1c88fff

delete hipify files

3dd58a63

add dropout, but fp16 doesn't work

659c56cc

add softmax and softmax grad

da3bbc3d

add SparseSoftmaxCrossEntropy

dac15ba3

Add lamb, but fp16 is disabled.

74fab112

add miopen and two more softmax tests below pass.

5f2080bb

use correct __shfl_* functions

38448599

fix softmax

677ea963

fix layer_norm

03f93a62

update SparseSoftmax to latest cuda version

8dc6bef1

fix layernorm version

2f1bca51

fix build issue on rocm3.1

e53215d7

more changes for fp16

c6730066

special hanlding in ReduceSum for case input/output size=1

a004e808

use hip for bert trainig

698e71bd

only build gfx906 for now to save building time

25c29fb6

fp16

8bd3ae9f

update TArray

c26f447f

using global memory for transpose kernel parameters

5647da1c

using global memory for kernel parameters

406c14c2

sync with ort_training

42e75fc6

Add dropout and lamb

fa0e19a9

fp16 atomicAdd

a99f9363

add uint24 math

08d2973d

add hipify script

b9084a17

sync with ort_training

2253573f

disable uint24

78f17fbc

disable pre-opset 10 ops and post-opset 10 ops

9a497311

enable rccl

43f6b615

revert to use USE_NCCL since it is used in some backward graph build …

463ddc48

fix build error

715279d4

enable mpi_context

b33faee0

fix build error

c55c776a

set correct device id

cd31cc59

disable mpi build

f1c03a70

re-enable mpi with building options.

96e7f72f

fix a rebase issue.

df7f128e

sync master

a2f8e229

can build without mpi

066e49b5

uncoment -g build option

43c5cc36

minor clean

7d66d374

add e2e throughput

c9fa021d

add Dockerfile

cae55163

polish run_perf.sh

f6612dcf

update Dockerfile

498fdf2e

update Dockerfile

bf3a81f0

update Dockerfile and run_perf.sh

aa5694d7

Fix hipLaunchKernelGGL compilation fails. (#4121)

23b0f3f7

Fix build failure on ROCm 3.5 by removing reference of compiler-rt. (…

e2842cbc

update to use ROCm3.5 docker image

051188e6

polish Dockerfile

34f7a712

more change to Dockerfile

9e8768cb

add hostfile

6b22aeed

minor change

06533c2c

add dockerfile for rocm3.3

5632e6df

dockerfile for rocm3.5

2026cca4

update Dockerfile.rocm3.5

aae458c6

update dockerfile for rocm3.5 with IB toolkit

362b2f1a

remove --update

fd8065c1

use openmpi 4.0.4

cc03119a

remove openmmpi build

135b0d17

use MPI_THREAD_FUNNELED

a0261307

disable warnings

de8860b5

update to master

f0e7e01d

1. Support PT frontend

657993d4

Revert "Replace loss function in BERT_LOSS with SoftmaxCrossEntropyLo…

46cd8fde

sync to latest master.

cea7c05a

add gfx908

db2a8b8d

Revert "Revert "Replace loss function in BERT_LOSS with SoftmaxCrossE…

a8485036

fix SoftmaxCrossEntropyLoss

40eda7d7

update unit tests by leveraging CUDA unit tests

890dfc59

fix the issue to increase bigger batch size

32156fb5

fix the compilation issue with change in hip (#4710)

bf110892

fix training loss issue exposed by using PT frontend

b0d33dad

don't compile RCCL kernels when RCCL is disabled.

cc535fd5

fix build issue with Clang

245380bf

change back to use MPI_THREAD_MULTIPLE

21f18f78

enable FP16 for Adam

8d0f99a4

remove __HIP_ARCH__ which was converted from __CUDA_ARCH__ and not su…

f96e9de6

using GPU_WARP_SIZE

258e456c

build lamb test

c661d2b4

improve docker file

5e58ef41

disable multi-tensor apply for lamb

c912a0f4

add ORT training examples in dockerfile.

28358bab

Add gpt2 fine tuning and Sync upto master 370d194db

d94bd1f9

enable split and where op for GPT2

dcae2c9e

add docker file for ROCm3.8

8a740f2e

leverage XDLOPS on MI100 for FP32

97c135ec

enable XDLOPS on MI100 for FP16

42b4587b

using rocBLAS instead of hipBLAS

d8803147

add model folder for docker build

3707457c

add rocprof.py

f29a118e

Merge CI pipeline into AMD GPU master (#5425)

c901ed85

rename hip to rocm

90dffe23

rename folder name from hip to rocm

bc0ac4aa

change header file path from hip to rocm

ea1b5ae8

rename file names from hip_xxx to rocm_xxx

ace065d5

minor change in CMakeLists.txt for rccl

61df8307

sync to master up to cd0386b6497

9e652e59

sync to master up to 0cb09374c68

e533f4a1

sync master up to 417929b0

e7cd00c9

sync master up to 80d36eab

6ba24a92

sync master up to 915d475353

9a37d261

remove code which was added for debugging

d82c9eaf

address comments of code review and fix build failures from python fi…

91a6eb69

update rocm contrib_ops by using amd_hipify.py

3f1c5d5a

update rocm core_ops by using amd_hipify.py

19f77362

update rocm training_ops by using amd_hipify.py

0839d135

enable amd_hipify.py in build.py

454854bf

clean more files and move dockerfiles for AMD GPU to the folder amdgpu

bdf7891a

rename HIPAllocator/HipAsyncBuffer/HIPFence to ROCMAllocator/RocmAsyn…

a38fea6e

remove hipblas code since rocblas is used.

936b09b1

weixingzhang force pushed 5 years ago

rename HIPMPinnedAllocator to ROCMPinnedAllocator and clean code

6fc75887

move rocm kernels to a seperate PR.

3a997e7a

weixingzhang force pushed to 3a997e7a 5 years ago

mrry commented on 2020-10-27

address code review comments

6bd8b64d

Merge branch 'master' into wezhan/rocm

85a0e1e6

mrry approved these changes on 2020-10-29

weixingzhang merged aec4cb48 into master 5 years ago

weixingzhang deleted the wezhan/rocm branch 5 years ago

onnxruntime
ROCm EP for AMD GPU
#5480

Merged

ROCm EP for AMD GPU #5480

onnxruntime ROCm EP for AMD GPU #5480 Merged

ROCm EP for AMD GPU #5480

onnxruntime
ROCm EP for AMD GPU
#5480

Merged