[dtensor] add op support for nll_loss_forward (#118917)
This is part of the work to support cross entropy in dtensor.
This PR doesn't support nll_loss computation with input sharded on the channel dimension yet. In that case, redistribution to Replicate is needed in sharding propagation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118917
Approved by: https://github.com/wanchaol