Fix cuda native loss_ctc for varying input length (#15798)
Summary:
Thank you, freesouls, for the reproducing example!
This is strictly fixing the bug in gradients for varying length inputs discussed in the middle-to-bottom of the bug report. I'll have a feature patch regarding inf losses -> NaN grads separately.
Fixes: #14401
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15798
Differential Revision: D13605739
Pulled By: soumith
fbshipit-source-id: 167ff42399c7e4cdfbd88d59bac5d25b57c0363f