[PyTorch] Parallelize gelu via tensoriterator (#58950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58950
Use tensor iterator's API to set grain size in order to parallelize gelu op.
ghstack-source-id: 130947174
Test Plan: test_gelu
Reviewed By: ezyang
Differential Revision: D28689819
fbshipit-source-id: 0a02066d47a4d9648323c5ec27d7e0e91f4c303a