CI Install fbgemm package needed by torchao (#2887)
Newer torchao versions require fbgemm-gpu-genai>=1.2.0. This PR installs
it into the GPU Dockerfile used for nightly testing.
I had initially hoped that this would also allow us to train
int4_weight_only quantized torchao models, as explored in #2848.
However, there is a new error when trying to use this. The test has been
updated to reflect this.