onnxruntime
ROCm EP for AMD GPU
#5480
Merged

Commits
  • init commit
    weixingzhang committed 5 years ago
  • keep using CUDA/CUDA_PINNED.
    weixingzhang committed 5 years ago
  • add training code
    weixingzhang committed 5 years ago
  • inference
    weixingzhang committed 5 years ago
  • training
    weixingzhang committed 5 years ago
  • add contrib ops
    weixingzhang committed 5 years ago
  • change EP type for op 10 & 11
    weixingzhang committed 5 years ago
  • delete files.
    weixingzhang committed 5 years ago
  • add layernorm, fix slice
    weixingzhang committed 5 years ago
  • delete hipify files
    weixingzhang committed 5 years ago
  • add dropout, but fp16 doesn't work
    weixingzhang committed 5 years ago
  • add softmax and softmax grad
    weixingzhang committed 5 years ago
  • add SparseSoftmaxCrossEntropy
    weixingzhang committed 5 years ago
  • Add lamb, but fp16 is disabled.
    weixingzhang committed 5 years ago
  • add miopen and two more softmax tests below pass.
    weixingzhang committed 5 years ago
  • use correct __shfl_* functions
    weixingzhang committed 5 years ago
  • fix softmax
    weixingzhang committed 5 years ago
  • fix layer_norm
    weixingzhang committed 5 years ago
  • update SparseSoftmax to latest cuda version
    weixingzhang committed 5 years ago
  • fix layernorm version
    weixingzhang committed 5 years ago
  • fix build issue on rocm3.1
    Weixing Zhang committed 5 years ago
  • more changes for fp16
    Weixing Zhang committed 5 years ago
  • special hanlding in ReduceSum for case input/output size=1
    Weixing Zhang committed 5 years ago
  • use hip for bert trainig
    Weixing Zhang committed 5 years ago
  • only build gfx906 for now to save building time
    Weixing Zhang committed 5 years ago
  • fp16
    Weixing Zhang committed 5 years ago
  • update TArray
    Weixing Zhang committed 5 years ago
  • using global memory for transpose kernel parameters
    Weixing Zhang committed 5 years ago
  • using global memory for kernel parameters
    Weixing Zhang committed 5 years ago
  • sync with ort_training
    Weixing Zhang committed 5 years ago
  • Add dropout and lamb
    Weixing Zhang committed 5 years ago
  • fp16 atomicAdd
    Weixing Zhang committed 5 years ago
  • add uint24 math
    Weixing Zhang committed 5 years ago
  • add hipify script
    Weixing Zhang committed 5 years ago
  • sync with ort_training
    Weixing Zhang committed 5 years ago
  • disable uint24
    Weixing Zhang committed 5 years ago
  • disable pre-opset 10 ops and post-opset 10 ops
    Weixing Zhang committed 5 years ago
  • enable rccl
    Weixing Zhang committed 5 years ago
  • revert to use USE_NCCL since it is used in some backward graph build code
    Weixing Zhang committed 5 years ago
  • fix build error
    Weixing Zhang committed 5 years ago
  • enable mpi_context
    Weixing Zhang committed 5 years ago
  • fix build error
    Weixing Zhang committed 5 years ago
  • set correct device id
    Weixing Zhang committed 5 years ago
  • disable mpi build
    Weixing Zhang committed 5 years ago
  • re-enable mpi with building options.
    Weixing Zhang committed 5 years ago
  • fix a rebase issue.
    Weixing Zhang committed 5 years ago
  • sync master
    Weixing Zhang committed 5 years ago
  • can build without mpi
    Weixing Zhang committed 5 years ago
  • uncoment -g build option
    Weixing Zhang committed 5 years ago
  • minor clean
    Weixing Zhang committed 5 years ago
  • add e2e throughput
    pengwa committed 5 years ago
  • add Dockerfile
    Weixing Zhang committed 5 years ago
  • polish run_perf.sh
    Weixing Zhang committed 5 years ago
  • update Dockerfile
    Weixing Zhang committed 5 years ago
  • update Dockerfile
    Weixing Zhang committed 5 years ago
  • update Dockerfile and run_perf.sh
    Weixing Zhang committed 5 years ago
  • Fix hipLaunchKernelGGL compilation fails. (#4121)
    sabreshao committed 5 years ago
  • Fix build failure on ROCm 3.5 by removing reference of compiler-rt. (#4120)
    sabreshao committed 5 years ago
  • update to use ROCm3.5 docker image
    Weixing Zhang committed 5 years ago
  • polish Dockerfile
    Weixing Zhang committed 5 years ago
  • more change to Dockerfile
    Weixing Zhang committed 5 years ago
  • add hostfile
    Weixing Zhang committed 5 years ago
  • minor change
    Weixing Zhang committed 5 years ago
  • add dockerfile for rocm3.3
    Weixing Zhang committed 5 years ago
  • dockerfile for rocm3.5
    Weixing Zhang committed 5 years ago
  • update Dockerfile.rocm3.5
    Weixing Zhang committed 5 years ago
  • update dockerfile for rocm3.5 with IB toolkit
    Weixing Zhang committed 5 years ago
  • remove --update
    Weixing Zhang committed 5 years ago
  • use openmpi 4.0.4
    Weixing Zhang committed 5 years ago
  • remove openmmpi build
    Weixing Zhang committed 5 years ago
  • use MPI_THREAD_FUNNELED
    Weixing Zhang committed 5 years ago
  • disable warnings
    Weixing Zhang committed 5 years ago
  • update to master
    Weixing Zhang committed 5 years ago
  • 1. Support PT frontend
    Weixing Zhang committed 5 years ago
  • Revert "Replace loss function in BERT_LOSS with SoftmaxCrossEntropyLoss. (#4509)"
    Weixing Zhang committed 5 years ago
  • sync to latest master.
    Weixing Zhang committed 5 years ago
  • add gfx908
    Weixing Zhang committed 5 years ago
  • Revert "Revert "Replace loss function in BERT_LOSS with SoftmaxCrossEntropyLoss. (#4509)""
    Weixing Zhang committed 5 years ago
  • fix SoftmaxCrossEntropyLoss
    Weixing Zhang committed 5 years ago
  • update unit tests by leveraging CUDA unit tests
    Weixing Zhang committed 5 years ago
  • fix the issue to increase bigger batch size
    Weixing Zhang committed 5 years ago
  • fix the compilation issue with change in hip (#4710)
    anghostcici committed 5 years ago
  • fix training loss issue exposed by using PT frontend
    Weixing Zhang committed 5 years ago
  • don't compile RCCL kernels when RCCL is disabled.
    Weixing Zhang committed 5 years ago
  • fix build issue with Clang
    Weixing Zhang committed 5 years ago
  • change back to use MPI_THREAD_MULTIPLE
    Weixing Zhang committed 5 years ago
  • add dockerfile for ROCm3.5 and ROCm3.7
    Weixing Zhang committed 5 years ago
  • enable FP16 for Adam
    Weixing Zhang committed 5 years ago
  • remove __HIP_ARCH__ which was converted from __CUDA_ARCH__ and not supported on AMD
    Weixing Zhang committed 5 years ago
  • using GPU_WARP_SIZE
    Weixing Zhang committed 5 years ago
  • build lamb test
    Weixing Zhang committed 5 years ago
  • improve docker file
    Weixing Zhang committed 5 years ago
  • disable multi-tensor apply for lamb
    Weixing Zhang committed 5 years ago
  • add ORT training examples in dockerfile.
    Weixing Zhang committed 5 years ago
  • Add gpt2 fine tuning and Sync upto master 370d194db
    Weixing Zhang committed 5 years ago
  • enable split and where op for GPT2
    Weixing Zhang committed 5 years ago
  • add docker file for ROCm3.8
    Weixing Zhang committed 5 years ago
  • leverage XDLOPS on MI100 for FP32
    Weixing Zhang committed 5 years ago
  • enable XDLOPS on MI100 for FP16
    Weixing Zhang committed 5 years ago
  • using rocBLAS instead of hipBLAS
    Weixing Zhang committed 5 years ago
  • + more commits ...
Loading