Add SELU Activation to calculate_gain (#50664)
Summary:
Fixes #{[24991](https://github.com/pytorch/pytorch/issues/24991)}
I used a value of 0.75 as suggested in the forums by Thomas: https://discuss.pytorch.org/t/calculate-gain-tanh/20854/6
I verified that the value keeps the gradient stable for a 100-layer network.
Code to reproduce (from [jpeg729](https://discuss.pytorch.org/t/calculate-gain-tanh/20854/4)):
```python
import torch
import torch.nn.functional as F
import sys
a = torch.randn(1000,1000, requires_grad=True)
b = a
print (f"in: {a.std().item():.4f}")
for i in range(100):
l = torch.nn.Linear(1000,1000, bias=False)
torch.nn.init.xavier_normal_(l.weight, torch.nn.init.calculate_gain("selu"))
b = getattr(F, 'selu')(l(b))
if i % 10 == 0:
print (f"out: {b.std().item():.4f}", end=" ")
a.grad = None
b.sum().backward(retain_graph=True)
print (f"grad: {a.grad.abs().mean().item():.4f}")
```
Output:
```
in: 1.0008
out: 0.7968 grad: 0.6509
out: 0.3127 grad: 0.2760
out: 0.2404 grad: 0.2337
out: 0.2062 grad: 0.2039
out: 0.2056 grad: 0.1795
out: 0.2044 grad: 0.1977
out: 0.2005 grad: 0.2045
out: 0.2042 grad: 0.2273
out: 0.1944 grad: 0.2034
out: 0.2085 grad: 0.2464
```
I included the necessary documentation change, and it passes the _test_calculate_gain_nonlinear_ unittest.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50664
Reviewed By: mruberry
Differential Revision: D25942217
Pulled By: ngimel
fbshipit-source-id: 29ff1be25713484fa7c516df71b12fdaecfb9af8