[bootcamp][pytorch][WIP] Support embedding_bag_byte_rowwise_offsets in cuda (#61075)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61075
Completed implementation of the embedding_bag_byte_rowwise_offsets wrote randomized test comparing GPU and CPU kernel outputs.
Test Plan:
```
buck build mode/opt --show-full-output //caffe2/torch/fb/sparsenn:gpu_test
/data/users/johnsonpaul/fbsource/fbcode/buck-out/gen/caffe2/torch/fb/sparsenn/gpu_test#binary.par -r test_embedding_bag_byte_rowwise_offsets
```
Reviewed By: hyuen
Differential Revision: D29218597
fbshipit-source-id: 786260466ab4e8e3d89540496bd8a38be14c5c1b