[CPU/CUDA EP] Add DeformConv op support (#27393)
### Description
<!-- Describe your changes. -->
This change adds support for the Deformable Convolution 2D operator
(DeformConv2D) to ONNX Runtime. The branch implements the operator
schema and registration, provides kernel implementations (CPU and
GPU/CUDA where available), implements shape inference, and adds unit and
integration tests to validate correctness and numerical parity with
reference implementations. The changes include performance-oriented
optimizations and necessary changes to build/test scripts.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Deformable convolutions are widely used in vision models that require
spatial sampling flexibility (e.g., Deformable ConvNets, some
detection/segmentation models). Native support in ONNX Runtime enables
these models to run efficiently without custom operators or external
runtimes, broadening the set of compatible models and improving
performance and portability.
### See also
- https://onnx.ai/onnx/operators/onnx__DeformConv.html
-
https://docs.pytorch.org/vision/main/generated/torchvision.ops.deform_conv2d.html
- https://arxiv.org/abs/1811.11168
- https://arxiv.org/abs/1703.06211
-
https://github.com/pytorch/vision/blob/0f6d91d9fe514e6de2f5519114cbeb389d498b2d/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu
-
https://github.com/open-mmlab/mmdetection/blob/master/mmdet/ops/dcn/src/deform_conv_cuda.cpp
-
https://github.com/pytorch/vision/blob/0f6d91d9fe514e6de2f5519114cbeb389d498b2d/torchvision/csrc/ops/cpu/deform_conv2d_kernel.cpp
-
https://github.com/open-mmlab/mmdetection/blob/master/mmdet/ops/dcn/src/deform_conv_cuda.cpp
- #22060
- #15572
- #20810
- #16903
- https://github.com/onnx/onnx/issues/5451
- https://github.com/ZhengPeng7/BiRefNet/pull/167
- https://github.com/pytorch/pytorch/issues/68910
- https://github.com/pytorch/vision/issues/2066