hexagon: add SOLVE_TRI op (#21974)
* hexagon: add SOLVE_TRI op
* ggml: fix TODO description for solve_tri
* hexagon: rm unused variable/function warnings
* hexagon: chunk vs batch processingfor better thread utilization
* hexagon: vectorize partial f32 loads
* hexagon: move HVX f32 add/sub/mul wrappers to hvx-base.h
---------
Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>