nn.EmbeddingBag bound check (#96022)
Summary: Today if we're accessing out of bound embedding rows, it'll either go through or throw IMA. This is not ideal - adding bound checks. This will probably slow things down - need to benchmark it.
Test Plan:
TODO: add some tests
Tried a simple example and it's showing this:
```
aten/src/ATen/native/cuda/EmbeddingBag.cu:143: EmbeddingBag_updateOutputKernel_sum_mean: block: [0,0,0], thread: [0,1,0] Assertion `input[emb] < numRows` failed.
```
Differential Revision: D43810777
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96022
Approved by: https://github.com/cpuhrsch, https://github.com/ngimel