[CPU] Update the NCHWc transformer to handle more patterns (#27691)
### Description
2 main changes:
1) Handle activations that are seen in modern CNNs in the NCHWc
transformer (`QuickGelu`, `Gelu`) and avoid reorder nodes getting
inserted before and after them to do the NCHWc <-> NCHW data layout
transforms. These can be avoided as these are elemtnwise ops that are
otherwise data layout agnostic
2) Rewrites a channel scaling Mul (or scaling input shape 1,C,1,1 or
C,1,1) into a depthwise conv NCHWc operation. This avoid resorder nodes
and enables fusions of any subsequent `Add` operations into the new
`Conv` node.
### Motivation and Context
Avoid unnecessary data layout operations and enable more NCHWc
compatible compute and fusions