Fix embedding quantization issue when memory format is not `contiguous ` (#82605)
Summary:
The current implementation of embedding quantization has the assumption that the memory address must be `contiguous` for the input `Tensor`
To guarantee that, we cast the input `weight` to be `contiguous` format, by
```
const auto weight_contig =
weight.expect_contiguous(weight.suggest_memory_format());
```
or
```
Tensor weight_contig = weight.contiguous(weight.suggest_memory_format());
```
However, in the branch `USE_FBGEMM = true`, it doesn't use `weight_contig`, which gives a wrong result when the input data is not `contiguous`
Example: N2297477
Test Plan:
```
buck1 test mode/dev //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_embedding_bag_byte_unpack (quantization.core.test_quantized_op.TestQuantizedEmbeddingOps)'
buck1 test mode/dev //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_embedding_bag_2bit_unpack (quantization.core.test_quantized_op.TestQuantizedEmbeddingOps)'
buck1 test mode/dev //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_embedding_bag_4bit_unpack (quantization.core.test_quantized_op.TestQuantizedEmbeddingOps)'
```
Differential Revision: D38302116
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82605
Approved by: https://github.com/houseroad