Fix SyncBatchNorm running var update issue (#22248)
Summary:
## Fix https://github.com/pytorch/pytorch/issues/22192
+ change signature: `func: batch_norm_gather_stats(Tensor input, Tensor mean, Tensor invstd, Tensor? running_mean, Tensor? running_var, float momentum, float eps, Tensor counts) -> (Tensor, Tensor)`
+ change cuda & cuda head
```cuda
std::tuple<Tensor, Tensor> batch_norm_gather_stats_cuda(const Tensor& self, const Tensor& mean, const Tensor& invstd, const Tensor& running_mean,
const Tensor& running_var, double momentum, double epsilon, int64_t count) {
const Tensor& running_var, double momentum, double epsilon, const Tensor& counts)
```
+ change python interface
```python
class SyncBatchNorm(Function):
def forward(self, input, weight, bias, running_mean, running_var, eps, momentum, process_group, world_size):
...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22248
Differential Revision: D16002146
Pulled By: mrshenli
fbshipit-source-id: 9007e83928267b89df4d3847aabfbdb63e456956