inductor(cpu): support mkldnn packed linear to improve bfloat16 performance (#96954)
As title, enable mkldnn packed linear to improve bfloat16 performance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96954
Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/desertfire