hexagon: add f32 ssm_conv op (#20122)
* hexagon: add ssm_conv op
* hexagon: hvx kernel is functional
* hexagon: improvements to ssm-conv hvx kernel
* hexagon: added dma to ssm-conv hvx kernel
* hexagon: ssm-conv dynamically compute gather scratchpad
* hex-ssm-conv: add local context and fix various issues (spad indexing, etc)
---------
Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>