onnxruntime
[CPU/CUDA ep] Improve DeformConv op performance
#27824
Merged

[CPU/CUDA ep] Improve DeformConv op performance #27824

ShirasawaSama
ShirasawaSama ShirasawaSama marked this pull request as draft 47 days ago
ShirasawaSama ShirasawaSama force pushed from 860a063b to fea461b0 47 days ago
ShirasawaSama
ShirasawaSama ShirasawaSama force pushed from b7364994 to 444c6977 46 days ago
ShirasawaSama ShirasawaSama force pushed from 9476ad45 to c672f51e 45 days ago
ShirasawaSama ShirasawaSama force pushed from b14d3b8e to 3623db2b 45 days ago
ShirasawaSama ShirasawaSama marked this pull request as ready for review 45 days ago
ShirasawaSama
ShirasawaSama ShirasawaSama changed the title [CPU/CUDA ep] [WIP] Improve DeformConv op performance [CPU/CUDA ep] Improve DeformConv op performance 45 days ago
ShirasawaSama ShirasawaSama marked this pull request as draft 44 days ago
ShirasawaSama ShirasawaSama marked this pull request as ready for review 44 days ago
ShirasawaSama ShirasawaSama force pushed from 7f4e6381 to 37099b6f 44 days ago
tianleiwu tianleiwu requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 41 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-29
ShirasawaSama ShirasawaSama marked this pull request as draft 40 days ago
ShirasawaSama ShirasawaSama force pushed from c867de16 to f938f4c4 40 days ago
ShirasawaSama ShirasawaSama force pushed from f938f4c4 to f892ad88 40 days ago
ShirasawaSama
ShirasawaSama ShirasawaSama marked this pull request as ready for review 40 days ago
ShirasawaSama ShirasawaSama force pushed from f892ad88 to cf3d79c6 40 days ago
ShirasawaSama
ShirasawaSama
ShirasawaSama Remove DeformConvCopyGemmOutputRowMajorToNCHW
0ec9bb08
ShirasawaSama Adjust parallel cost for DeformableIm2col
3b5087c1
ShirasawaSama Refactor deform conv bilinear with plan
dffbe68d
ShirasawaSama Simplify DeformConv im2col plan paths and fix mask indexing bug
d07e8434
ShirasawaSama Refactor deform conv im2col to use a unified tiled path with context …
7d5125d4
ShirasawaSama Refactor DeformConv sampling plan to AoSoA layout and use Eigen for b…
435bada0
ShirasawaSama Refine DeformConv naming clarity and avoid redundant workspace size r…
c963bd9a
ShirasawaSama Optimize DeformConv by removing streaming plan logic and making bilin…
5bcb4024
ShirasawaSama Refactor Deformconv cpu op
0c2a602c
ShirasawaSama Harden DeformConv integer bounds checks and streamline hot-path casts…
3f2fee92
ShirasawaSama Refactor DeformConv bounds validation
7ae47a47
ShirasawaSama Add compute-time bounds checks with size_t-safe indexing
7c0d4143
ShirasawaSama Optimize CPU DeformConv plan generation with kernel meta precompute
02f9e0c2
ShirasawaSama Refactor DeformConv kernel meta setup into a params-based cached
083b33c2
ShirasawaSama Refactor CPU DeformConv bias add to avoid div/mod and extract DeformC…
afe2dd10
ShirasawaSama Annotate DeformConv CPU bias/col paths with ORT_CPU_RESTRICT and forc…
d61d36c5
ShirasawaSama CPU DeformConv bilinear sampling uses fast floor and inverted bounds …
baf51acf
ShirasawaSama Flatten CPU DeformConv bilinear sampling plan build tasks across spat…
43730c20
ShirasawaSama Optimize CPU DeformConv sampling and bias parallelism with flattened …
47bb183f
ShirasawaSama Add detailed comments for DeformConv CPU implementation
3520625c
ShirasawaSama Reformat codes
052507ef
ShirasawaSama Optimize DeformConv CPU kernel by removing mutex and heap allocations
b92e8c82
ShirasawaSama Optimize CUDA DeformConv kernel with static mask branching and tuned …
a9e5cc72
ShirasawaSama CUDA DeformConv reduce 64 bit index pressure in im2col hot path
63592251
ShirasawaSama Increase InlinedVector capacity in DeformConv for 7x7 kernels
47ab139c
ShirasawaSama Optimize DeformConv bias indexing with int32/int64 dispatch and clean…
f5902813
ShirasawaSama Optimize CUDA DeformConv bias add with 2D launch fast path and int32/…
b29da63e
ShirasawaSama Optimize CUDA DeformConv by using 32-bit index arithmetic when safe a…
2c5a52c7
ShirasawaSama Refactor path indexing
45790773
ShirasawaSama optimize deformconv bilinear sampling with interior fast path
cf212004
ShirasawaSama Rduce deformconv address math in dynamic im2col path
97f2598c
ShirasawaSama Tune deform conv im2col addressing and bilinear sampling
226d3ad7
ShirasawaSama Cuda deform conv replace 5x5 im2col launch specialization with 7x7
bdb90bc2
ShirasawaSama Pick chunk size by min rounds then balanced ceil
af3639ee
ShirasawaSama Fix CUDA DeformConv im2col mask stride unused-variable warning
f223c696
ShirasawaSama Document and tidy CUDA DeformConv
9632a204
ShirasawaSama Make deform conv bilinear sampling branchless with masked safe loads
16e990c5
ShirasawaSama Improve comments and code styles
e0558b43
ShirasawaSama Improve deform conv im2col load balance for offset_group=1
d6ebfb96
ShirasawaSama Harden DeformConv index-width guard and align mask test comment
3831979c
ShirasawaSama Optimize BilinearInterpolate with one-sided bounds and float mask selp
bc4f9c86
ShirasawaSama Make deform_conv_attributes.h self-contained for numeric_limits
2068b1a5
ShirasawaSama Clarify bilinear index int32 safety comments
8c6ed740
ShirasawaSama ShirasawaSama force pushed from cf3d79c6 to 8c6ed740 38 days ago
tianleiwu
tianleiwu requested changes on 2026-04-04
ShirasawaSama Fix CeilDiv signed overflow in CUDA DeformConv chunk sizing
db3d449d
ShirasawaSama Rename offset_byte_offset to offset_elem_offset in CUDA DeformConv im…
f9c1d8ce
ShirasawaSama Document heuristic threshold for DeformConv CUDA bias-add 2D launch path
10241725
ShirasawaSama Document CPU DeformConv sampling-plan tail invariants
6fb5f4f3
ShirasawaSama Add test cases
394d6767
ShirasawaSama Add pointer restrict annotations to DeformConv CPU and CUDA
8fea660a
tianleiwu
tianleiwu commented on 2026-04-04
ShirasawaSama Fix DeformConv CUDA tail chunk col stride and add regression test
76dfdba1
ShirasawaSama Document DeformConv aliasing assumptions for input and output buffers
2574ee29
ShirasawaSama
tianleiwu
tianleiwu requested changes on 2026-04-06
ShirasawaSama Fix DeformConv CUDA grouped tail chunk col-buffer strides and add tai…
cdb979cf
ShirasawaSama Clarify DeformConv CUDA tail-chunk stride comment for grouped GEMM
ad977c6a
ShirasawaSama
ShirasawaSama ShirasawaSama marked this pull request as draft 33 days ago
ShirasawaSama ShirasawaSama marked this pull request as ready for review 33 days ago
ShirasawaSama ShirasawaSama marked this pull request as draft 33 days ago
ShirasawaSama
ShirasawaSama ShirasawaSama marked this pull request as ready for review 33 days ago
tianleiwu
tianleiwu commented on 2026-04-07
tianleiwu
tianleiwu commented on 2026-04-07
ShirasawaSama Reuse validated common dims for GetNParallelImgs to keep overflow che…
a7675257
tianleiwu
tianleiwu approved these changes on 2026-04-07
tianleiwu
azure-pipelines
tianleiwu tianleiwu merged 9d7e6d53 into main 31 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone