[ROCm] add CK GroupNorm to GroupNormTunable (#15510)
- Add CK GroupNorm to GroupNormTunable.
- Reduce configuration of GroupNormNHWCOp because CK implementation is
better.
The performance gain on stable diffusion v1.5.
Before:
```
'height': 512
'width': 512
'steps': 50
'batch_size': 1
'batch_count': 5
'num_prompts': 1
'average_latency': 2.4782688856124877
'median_latency': 2.4783748388290405
'provider': 'ROCMExecutionProvider'
'disable_safety_checker': True
```
After:
```
'height': 512,
'width': 512,
'steps': 50,
'batch_size': 1,
'batch_count': 5,
'num_prompts': 1,
'average_latency': 2.107170510292053,
'median_latency': 2.1067750453948975,
'first_run_memory_MB': -1,
'second_run_memory_MB': -1,
'provider': 'ROCMExecutionProvider',
'disable_safety_checker': True
```