faster batch sampler (#76951)
Fixes #76950
Improve the performance of iteration on `BatchSampler` , especially when `batch_size` is big.
Python 3.6.8:
```
batch_size drop_last speedup
------------ ----------- -------
4 True -18.07%
4 False 15.92%
8 True 9.43%
8 False 30.90%
64 True 54.99%
64 False 49.64%
640 True 66.26%
640 False 48.32%
6400 True 69.06%
6400 False 45.17%
```
Python 3.8.12:
```
batch_size drop_last speedup
------------ ----------- --------
4 True -10.50%
4 False -0.78%
8 True 24.40%
8 False 10.20%
64 True 90.96%
64 False 26.09%
640 True 112.88%
640 False 20.09%
6400 True 111.80%
6400 False 18.37%
```
Check the issue page for more details of the tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76951
Approved by: https://github.com/ejguan