[CPU/CUDA ep] Improve DeformConv op performance #27824
ShirasawaSama
marked this pull request as ready for review 45 days ago
ShirasawaSama
changed the title [CPU/CUDA ep] [WIP] Improve DeformConv op performance [CPU/CUDA ep] Improve DeformConv op performance 45 days ago
ShirasawaSama
marked this pull request as ready for review 44 days ago
ShirasawaSama
marked this pull request as ready for review 40 days ago
Remove DeformConvCopyGemmOutputRowMajorToNCHW
0ec9bb08
Adjust parallel cost for DeformableIm2col
3b5087c1
Refactor deform conv bilinear with plan
dffbe68d
Simplify DeformConv im2col plan paths and fix mask indexing bug
d07e8434
Refactor deform conv im2col to use a unified tiled path with context …
7d5125d4
Refactor DeformConv sampling plan to AoSoA layout and use Eigen for b…
435bada0
Refine DeformConv naming clarity and avoid redundant workspace size r…
c963bd9a
Optimize DeformConv by removing streaming plan logic and making bilin…
5bcb4024
Refactor Deformconv cpu op
0c2a602c
Harden DeformConv integer bounds checks and streamline hot-path casts…
3f2fee92
Refactor DeformConv bounds validation
7ae47a47
Add compute-time bounds checks with size_t-safe indexing
7c0d4143
Optimize CPU DeformConv plan generation with kernel meta precompute
02f9e0c2
Refactor DeformConv kernel meta setup into a params-based cached
083b33c2
Refactor CPU DeformConv bias add to avoid div/mod and extract DeformC…
afe2dd10
Annotate DeformConv CPU bias/col paths with ORT_CPU_RESTRICT and forc…
d61d36c5
CPU DeformConv bilinear sampling uses fast floor and inverted bounds …
baf51acf
Flatten CPU DeformConv bilinear sampling plan build tasks across spat…
43730c20
Optimize CPU DeformConv sampling and bias parallelism with flattened …
47bb183f
Add detailed comments for DeformConv CPU implementation
3520625c
Reformat codes
052507ef
Optimize DeformConv CPU kernel by removing mutex and heap allocations
b92e8c82
Optimize CUDA DeformConv kernel with static mask branching and tuned …
a9e5cc72
CUDA DeformConv reduce 64 bit index pressure in im2col hot path
63592251
Increase InlinedVector capacity in DeformConv for 7x7 kernels
47ab139c
Optimize DeformConv bias indexing with int32/int64 dispatch and clean…
f5902813
Optimize CUDA DeformConv bias add with 2D launch fast path and int32/…
b29da63e
Optimize CUDA DeformConv by using 32-bit index arithmetic when safe a…
2c5a52c7
Refactor path indexing
45790773
optimize deformconv bilinear sampling with interior fast path
cf212004
Rduce deformconv address math in dynamic im2col path
97f2598c
Tune deform conv im2col addressing and bilinear sampling
226d3ad7
Cuda deform conv replace 5x5 im2col launch specialization with 7x7
bdb90bc2
Pick chunk size by min rounds then balanced ceil
af3639ee
Fix CUDA DeformConv im2col mask stride unused-variable warning
f223c696
Document and tidy CUDA DeformConv
9632a204
Make deform conv bilinear sampling branchless with masked safe loads
16e990c5
Improve comments and code styles
e0558b43
Improve deform conv im2col load balance for offset_group=1
d6ebfb96
Harden DeformConv index-width guard and align mask test comment
3831979c
Optimize BilinearInterpolate with one-sided bounds and float mask selp
bc4f9c86
Make deform_conv_attributes.h self-contained for numeric_limits
2068b1a5
Clarify bilinear index int32 safety comments
8c6ed740
Fix CeilDiv signed overflow in CUDA DeformConv chunk sizing
db3d449d
Rename offset_byte_offset to offset_elem_offset in CUDA DeformConv im…
f9c1d8ce
Document heuristic threshold for DeformConv CUDA bias-add 2D launch path
10241725
Document CPU DeformConv sampling-plan tail invariants
6fb5f4f3
Add test cases
394d6767
Add pointer restrict annotations to DeformConv CPU and CUDA
8fea660a
Fix DeformConv CUDA tail chunk col stride and add regression test
76dfdba1
Document DeformConv aliasing assumptions for input and output buffers
2574ee29
Fix DeformConv CUDA grouped tail chunk col-buffer strides and add tai…
cdb979cf
Clarify DeformConv CUDA tail-chunk stride comment for grouped GEMM
ad977c6a
ShirasawaSama
marked this pull request as ready for review 33 days ago
ShirasawaSama
marked this pull request as ready for review 33 days ago
Reuse validated common dims for GetNParallelImgs to keep overflow che…
a7675257
tianleiwu
approved these changes
on 2026-04-07
tianleiwu
merged
9d7e6d53
into main 31 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub