Make the derivative of masked_fill more efficient (#83515)
There's no need to add all the zeros if we extract all the non-zero
elements.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83515
Approved by: https://github.com/albanD, https://github.com/soulitzer