Fix the kl_div docs (#67443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67443
Fixes https://github.com/pytorch/pytorch/issues/57459
After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.
To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.
These docs also add an example that may further help understanding the
intended use of this loss.
cc brianjo mruberry
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D32136888
Pulled By: jbschlosser
fbshipit-source-id: 1ad0a606948656b44ff7d2a701d995c75875e671