glu: port cpu forward implementation to ATen (#26410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26410
I only ported the CPU forward implementation for now to try a CPU-only benchmark.
Test Plan: Imported from OSS
Differential Revision: D17454519
Pulled By: gchanan
fbshipit-source-id: ff757cf972c5627074fea2f92a670129007a49f4