openvino
afb93f9a - [GPU] Fix oneDNN FP16 convolution format selection for channel expansion operations (#33131)

Commit
123 days ago
[GPU] Fix oneDNN FP16 convolution format selection for channel expansion operations (#33131) ### Details: - When FP16 dynamic convolution has small input channels (≤4) and large output channels (e.g., 1024), the current format selection logic chooses `bfyx → fsv16`, which triggers oneDNN reference kernel instead of optimized JIT kernel, resulting in significant performance degradation. - Override output format to planar (bfyx) when input channels are small (≤ 16), and output channels are large (≥ 32) **Current behavior:** - Input: 3 channels → Converted to `bfyx` - Output: 1024 channels → Remains `fsv16` (only changed when output ≤ 4) - Result: `bfyx → fsv16` combination uses **reference kernel** (slow) #### Root Cause The fsv16 blocked format is optimized for reading many channels but introduces overhead when used for writing outputs in channel-expansion scenarios (small input → large output). oneDNN's reference kernel is selected because: 1. **Inefficient write pattern**: fsv16 output requires interleaved writes every 16 elements (non-contiguous) 2. **No optimized implementation**: oneDNN doesn't provide JIT-optimized kernel for fsv16 output generation from small input channels 3. **Scatter write overhead**: Writing 1024 channels in fsv16 format requires complex block-strided access ### Tickets: - [CVS-177671](https://jira.devtools.intel.com/browse/CVS-177671) Signed-off-by: Andrew Park <andrew.park@intel.com>
Author
Parents
Loading