move aten.native_batch_norm_backward decomposition to core (#81522)
Move aten.native_batch_norm_backward decomposition from https://github.com/pytorch/functorch/blob/main/functorch/_src/decompositions.py#L148.
Changed to not recompute mean and invstd, added type cast.
In fucntorch, changed `@register_decomposition_for(aten.native_batch_norm_backward)` to `@register_decomposition_for_jvp(aten.native_batch_norm_backward)`
Passing `pytest test/test_decomp.py -k norm`
Note that when the output mask is False for grad_weight and grad_bias, we should return None to be consistent with the non-decomposed operator's behavior. But "None" doesn't work with vjp, so the version of decomposition in functorch used zeros. See https://github.com/pytorch/pytorch/blob/b33c1f7dd4a4d30ebc912f555e56d105ae66aa84/functorch/functorch/_src/decompositions.py#L210.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81522
Approved by: https://github.com/Chillee