Use the EmbeddingLookup API which takes the offsets instead of lengths (#24945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24945
As Title says.
ghstack-source-id: 88903516
Test Plan:
To Check with CI.
```
import torch, time
eb = torch.nn.EmbeddingBag(1000000, 64, mode='sum')
input = torch.LongTensor(1500).random_(0, 1000000)
offsets = torch.zeros(64, dtype=torch.int64)
niter = 10000
s = time.time()
for i in range(niter):
out = eb(input, offsets)
time_per_iter = (time.time() - s) / niter
print('time_per_iter', time_per_iter)
print('GB/s', (input.numel() * 64 * 4 + out.numel() * 4) / time_per_iter / 1e9)
```
Reviewed By: bddppq
Differential Revision: D16930519
fbshipit-source-id: 44d59ca2588deecde1adb096673fc100bcd9bc46