[ESIMD] Fix implementations of block_load(usm, ...) and block_load(acc) (#11797)
1) Fix the big mess in E2E test for block_load(). Test did not really
check the mask variant. It also used wrong alignments.
2) Fix the comments for USM and ACC block_load implementations.
3) Minor optimization for ACC block_load functions that do not accept
the byte_offset operand. We can assume align16 for them.
Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>