KFI-203 Improve thread safety of packing in convolve_kleidiai.cpp (#26575)
### Description
Making cache objects of packed data thread_local rather than static.
### Motivation and Context
Both LHS and RHS packing utilize a cache mechanism based on a static
unordered map. There's the potential for interference between parallel
inference sessions. Made both structures thread_local.
Signed-off-by: Colm Donelan <colm.donelan@arm.com>