optionally zero infinite losses in CTCLoss (#16199)
Summary:
Here is a stab at implementing an option to zero out infinite losses (and NaN gradients).
It might be nicer to move the zeroing to the respective kernels.
The default is currently `False` to mimic the old behaviour, but I'd be half inclined to set the default to `True`, because the behaviour wasn't consistent between CuDNN and Native anyways and the NaN gradients aren't terribly useful.
This topic seems to come up regularly, e.g. in #14335
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16199
Differential Revision: D14020462
Pulled By: ezyang
fbshipit-source-id: 5ba8936c66ec6e61530aaf01175dc49f389ae428