Enable QK Group Norm (#869)
* start qkgn
* attn defaults for qk_gn
* impl qk_gn
* Update attention.py
* Update attention.py
* Update test_flash_triton_torch.py
* Update attention.py
* Update test_flash_triton_torch.py
* Update attention.py
* lint
* Update attention.py
* lint
* add avlue error
* Update attention.py
* updt to include low precision groupnorm;
* perf improvement
* Revert "perf improvement"
This reverts commit 2b62d5ecd21e13cb1bcd0883b3f6ebd1229e9d1d.
* Revert "updt to include low precision groupnorm;"
This reverts commit bca1c3383f5d2ea3009d4ee297ccc26db146cf20.