onnxruntime
9189ebb4 - Optimize slicing when possible by copying bigger blocks at once (#13261)

Commit
3 years ago
Optimize slicing when possible by copying bigger blocks at once (#13261) ### Description Currently, SliceIterator copies inner dimension size at once at best. However, there are many slices when several inner dimensions can be copied at once. Furthermore, even if a dimension is sliced, it may employ step 1 and, therefore, has a continuous block of inner dimensions that can be copied at once. ### Motivation and Context For example, `[N, C, H, W]` with slice `[:, :, i:, :]` and `[N, C, H-i, W]`. Meaning, we slice along single axis, with step = 1. Current implementation does `C * (H-i) memcpy` with W elements each. With this change we can do `C memcpy with (H-i)*W` elements each. The optimization produces ~11% savings on certain internal models.
Author
Parents
Loading