fix the invalid configuration argument error when running layer norm backward (#80893)
Summary: Fix the corner case with N = 0
Test Plan:
buck run mode/opt //deeplearning/fbgemm/fbgemm_gpu/fb:layer_norm_test 2>&1 | tee out.log
Before this Diff
```
test_swish_layer_norm (fbgemm_gpu.test.layer_norm_test.SparseOpsTest) ... INFO:2022-07-05 09:00:32 738347:738347 CuptiActivityProfiler.cpp:166] CUDA versions. CUPTI: 14; Runtime: 11040; Driver: 11040
Falsifying example: test_swish_layer_norm(
self=<fbgemm_gpu.test.layer_norm_test.SparseOpsTest testMethod=test_swish_layer_norm>,
M=1,
N=0,
dtype=torch.float32,
device='cuda',
epsilon=0.1,
)
ERROR
======================================================================
ERROR: test_swish_layer_norm (fbgemm_gpu.test.layer_norm_test.SparseOpsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/data/users/jianyuhuang/fbsource/fbcode/buck-out/opt/gen/aab7ed39/deeplearning/fbgemm/fbgemm_gpu/fb/layer_norm_test#binary,link-tree/fbgemm_gpu/test/layer_norm_test.py", line 41, in test_swish_layer_norm
M=st.integers(0, 32),
File "/data/users/jianyuhuang/fbsource/fbcode/buck-out/opt/gen/aab7ed39/deeplearning/fbgemm/fbgemm_gpu/fb/layer_norm_test#binary,link-tree/hypothesis/core.py", line 1164, in wrapped_test
raise the_error_hypothesis_found
File "/data/users/jianyuhuang/fbsource/fbcode/buck-out/opt/gen/aab7ed39/deeplearning/fbgemm/fbgemm_gpu/fb/layer_norm_test#binary,link-tree/fbgemm_gpu/test/layer_norm_test.py", line 88, in test_swish_layer_norm
Y_ref.backward(grad_output, retain_graph=True)
File "/data/users/jianyuhuang/fbsource/fbcode/buck-out/opt/gen/aab7ed39/deeplearning/fbgemm/fbgemm_gpu/fb/layer_norm_test#binary,link-tree/torch/_tensor.py", line 401, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/data/users/jianyuhuang/fbsource/fbcode/buck-out/opt/gen/aab7ed39/deeplearning/fbgemm/fbgemm_gpu/fb/layer_norm_test#binary,link-tree/torch/autograd/__init__.py", line 191, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
----------------------------------------------------------------------
Ran 1 test in 3.578s
FAILED (errors=1)
```
Differential Revision: D37618022
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80893
Approved by: https://github.com/ngimel