improve CPU performance for log_softmax when dim != -1 on both float32 and bfloat16 (#72163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72163
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64726
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D33862416
Pulled By: frank-wei
fbshipit-source-id: 41359864348dc2425b22f7ae02883e95922192e2
(cherry picked from commit 377909dbe76f30806e4a933f638127e81470b4de)