LayerNorm: Handling if batch size is zero (#28614)
Summary:
Handling of empty example was giving a cuda error.
Adding getLastError check to make sure cuda errors are attributed to the
correct function (instead of currently it was attributing the error to the next
cuda operator).
Added special case for batch-size zero, also added to cpu to keep things
consistent.
Resubmit of D18085429 without stacked commits
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28614
Test Plan: test included
Differential Revision: D18122212
Pulled By: ggoossen
fbshipit-source-id: 8c6741a157a9fbbc82685d81a6f8021452b650d4