openvino
21c2b7be - [Snippets][CPU] Optimize runtime offset handling in ARM64 kernel emitters and utils (#31668)

Commit
276 days ago
[Snippets][CPU] Optimize runtime offset handling in ARM64 kernel emitters and utils (#31668) ### Details: Implements comprehensive ARM64 instruction fusion optimizations across the snippets CPU plugin to reduce instruction count and improve performance on ARM64 platforms. Key optimizations: - Replace mul+add sequence with fused madd instruction in kernel emitter - Implement load-pair (LDP) for consecutive pointer loading in kernel initialization - Add store-pair (STP) fast path for zero-offset pointer storage in utils - Optimize stack memory operations with paired load/store instructions - Consolidate memory access patterns to leverage ARM64 addressing modes ### Tickets: - N/A
Author
Parents
Loading