forward-mode AD formula for F.dropout
Here's what native_dropout does:
- it randomly drops out things in the input with probability p by
multiplying the input with a random mask
- it scales the output with `(p == 1 ? 0.0 : 1.0 / (1.0 - p))`
Further, native_dropout returns two things: the output and the mask
used.
Derivation of formula:
- dropout(x, mask) = mask * x * (p == 1 ? 0.0 : 1.0 / (1.0 - p))
- therefore the formula for `x` is: x_tangent * mask * (p == 1 ? 0.0 : 1.0 / (1.0 - p))
Test Plan:
- OpInfo
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75288
Approved by: https://github.com/soulitzer