llvm
c6362a01 - [ESIMD] Fix perf regression caused by assumed align in block_load(usm) (#11850)

Commit
2 years ago
[ESIMD] Fix perf regression caused by assumed align in block_load(usm) (#11850) The element-size address alignment is valid from correctness point of view, but using 1-byte and 2-byte alignment implicitly causes performance regression for block_load(const int8_t *, ...) and block_load(const int16_t *, ...) because GPU BE have to generate slower GATHER instead of more efficient BLOCK-LOAD. Without this fix block-load causes up to 44% performance slow-down on some apps that used block_load() with alignment assumptions used before block_load(usm, ..., compile_time_props) was implemented. The reasoning for the expected/assumed alignment from element-size to 4-bytes for byte- and word-vectors is such: The idea of block_load() call (opposing to gather() call) is to have efficient block-load, and thus the assumed alignment is such that allows to generate block-load. This is a bit more tricky for user but that is how block_load/store API always worked before: block-load had restrictions that needed to be honored. To be on safer side, user can always pass the guaranteed alignment. --------- Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
Author
Parents
Loading