onnxruntime
ROCm EP for AMD GPU
#5480
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
126
Changes
View On
GitHub
Commits
init commit
weixingzhang
committed
5 years ago
keep using CUDA/CUDA_PINNED.
weixingzhang
committed
5 years ago
add training code
weixingzhang
committed
5 years ago
inference
weixingzhang
committed
5 years ago
training
weixingzhang
committed
5 years ago
add contrib ops
weixingzhang
committed
5 years ago
change EP type for op 10 & 11
weixingzhang
committed
5 years ago
delete files.
weixingzhang
committed
5 years ago
add layernorm, fix slice
weixingzhang
committed
5 years ago
delete hipify files
weixingzhang
committed
5 years ago
add dropout, but fp16 doesn't work
weixingzhang
committed
5 years ago
add softmax and softmax grad
weixingzhang
committed
5 years ago
add SparseSoftmaxCrossEntropy
weixingzhang
committed
5 years ago
Add lamb, but fp16 is disabled.
weixingzhang
committed
5 years ago
add miopen and two more softmax tests below pass.
weixingzhang
committed
5 years ago
use correct __shfl_* functions
weixingzhang
committed
5 years ago
fix softmax
weixingzhang
committed
5 years ago
fix layer_norm
weixingzhang
committed
5 years ago
update SparseSoftmax to latest cuda version
weixingzhang
committed
5 years ago
fix layernorm version
weixingzhang
committed
5 years ago
fix build issue on rocm3.1
Weixing Zhang
committed
5 years ago
more changes for fp16
Weixing Zhang
committed
5 years ago
special hanlding in ReduceSum for case input/output size=1
Weixing Zhang
committed
5 years ago
use hip for bert trainig
Weixing Zhang
committed
5 years ago
only build gfx906 for now to save building time
Weixing Zhang
committed
5 years ago
fp16
Weixing Zhang
committed
5 years ago
update TArray
Weixing Zhang
committed
5 years ago
using global memory for transpose kernel parameters
Weixing Zhang
committed
5 years ago
using global memory for kernel parameters
Weixing Zhang
committed
5 years ago
sync with ort_training
Weixing Zhang
committed
5 years ago
Add dropout and lamb
Weixing Zhang
committed
5 years ago
fp16 atomicAdd
Weixing Zhang
committed
5 years ago
add uint24 math
Weixing Zhang
committed
5 years ago
add hipify script
Weixing Zhang
committed
5 years ago
sync with ort_training
Weixing Zhang
committed
5 years ago
disable uint24
Weixing Zhang
committed
5 years ago
disable pre-opset 10 ops and post-opset 10 ops
Weixing Zhang
committed
5 years ago
enable rccl
Weixing Zhang
committed
5 years ago
revert to use USE_NCCL since it is used in some backward graph build code
Weixing Zhang
committed
5 years ago
fix build error
Weixing Zhang
committed
5 years ago
enable mpi_context
Weixing Zhang
committed
5 years ago
fix build error
Weixing Zhang
committed
5 years ago
set correct device id
Weixing Zhang
committed
5 years ago
disable mpi build
Weixing Zhang
committed
5 years ago
re-enable mpi with building options.
Weixing Zhang
committed
5 years ago
fix a rebase issue.
Weixing Zhang
committed
5 years ago
sync master
Weixing Zhang
committed
5 years ago
can build without mpi
Weixing Zhang
committed
5 years ago
uncoment -g build option
Weixing Zhang
committed
5 years ago
minor clean
Weixing Zhang
committed
5 years ago
add e2e throughput
pengwa
committed
5 years ago
add Dockerfile
Weixing Zhang
committed
5 years ago
polish run_perf.sh
Weixing Zhang
committed
5 years ago
update Dockerfile
Weixing Zhang
committed
5 years ago
update Dockerfile
Weixing Zhang
committed
5 years ago
update Dockerfile and run_perf.sh
Weixing Zhang
committed
5 years ago
Fix hipLaunchKernelGGL compilation fails. (#4121)
sabreshao
committed
5 years ago
Fix build failure on ROCm 3.5 by removing reference of compiler-rt. (#4120)
sabreshao
committed
5 years ago
update to use ROCm3.5 docker image
Weixing Zhang
committed
5 years ago
polish Dockerfile
Weixing Zhang
committed
5 years ago
more change to Dockerfile
Weixing Zhang
committed
5 years ago
add hostfile
Weixing Zhang
committed
5 years ago
minor change
Weixing Zhang
committed
5 years ago
add dockerfile for rocm3.3
Weixing Zhang
committed
5 years ago
dockerfile for rocm3.5
Weixing Zhang
committed
5 years ago
update Dockerfile.rocm3.5
Weixing Zhang
committed
5 years ago
update dockerfile for rocm3.5 with IB toolkit
Weixing Zhang
committed
5 years ago
remove --update
Weixing Zhang
committed
5 years ago
use openmpi 4.0.4
Weixing Zhang
committed
5 years ago
remove openmmpi build
Weixing Zhang
committed
5 years ago
use MPI_THREAD_FUNNELED
Weixing Zhang
committed
5 years ago
disable warnings
Weixing Zhang
committed
5 years ago
update to master
Weixing Zhang
committed
5 years ago
1. Support PT frontend
Weixing Zhang
committed
5 years ago
Revert "Replace loss function in BERT_LOSS with SoftmaxCrossEntropyLoss. (#4509)"
Weixing Zhang
committed
5 years ago
sync to latest master.
Weixing Zhang
committed
5 years ago
add gfx908
Weixing Zhang
committed
5 years ago
Revert "Revert "Replace loss function in BERT_LOSS with SoftmaxCrossEntropyLoss. (#4509)""
Weixing Zhang
committed
5 years ago
fix SoftmaxCrossEntropyLoss
Weixing Zhang
committed
5 years ago
update unit tests by leveraging CUDA unit tests
Weixing Zhang
committed
5 years ago
fix the issue to increase bigger batch size
Weixing Zhang
committed
5 years ago
fix the compilation issue with change in hip (#4710)
anghostcici
committed
5 years ago
fix training loss issue exposed by using PT frontend
Weixing Zhang
committed
5 years ago
don't compile RCCL kernels when RCCL is disabled.
Weixing Zhang
committed
5 years ago
fix build issue with Clang
Weixing Zhang
committed
5 years ago
change back to use MPI_THREAD_MULTIPLE
Weixing Zhang
committed
5 years ago
add dockerfile for ROCm3.5 and ROCm3.7
Weixing Zhang
committed
5 years ago
enable FP16 for Adam
Weixing Zhang
committed
5 years ago
remove __HIP_ARCH__ which was converted from __CUDA_ARCH__ and not supported on AMD
Weixing Zhang
committed
5 years ago
using GPU_WARP_SIZE
Weixing Zhang
committed
5 years ago
build lamb test
Weixing Zhang
committed
5 years ago
improve docker file
Weixing Zhang
committed
5 years ago
disable multi-tensor apply for lamb
Weixing Zhang
committed
5 years ago
add ORT training examples in dockerfile.
Weixing Zhang
committed
5 years ago
Add gpt2 fine tuning and Sync upto master 370d194db
Weixing Zhang
committed
5 years ago
enable split and where op for GPT2
Weixing Zhang
committed
5 years ago
add docker file for ROCm3.8
Weixing Zhang
committed
5 years ago
leverage XDLOPS on MI100 for FP32
Weixing Zhang
committed
5 years ago
enable XDLOPS on MI100 for FP16
Weixing Zhang
committed
5 years ago
using rocBLAS instead of hipBLAS
Weixing Zhang
committed
5 years ago
+ more commits ...
Loading