pytorch
e77abb9a - Normalize reward-to-go in C++ actor-critic (#33550)

Commit

4 years ago

Normalize reward-to-go in C++ actor-critic (#33550) Summary: Comparing to the [Python implementation](https://github.com/pytorch/examples/blob/master/reinforcement_learning/actor_critic.py), it seems like the tensor of normalized reward-to-go is computed but never used. Even if it's just an integration test, this PR switches to the normalized version for better convergence. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33550 Differential Revision: D20024393 Pulled By: yf225 fbshipit-source-id: ebcf0fee14ff39f65f6744278fb0cbf1fc92b919

Author

nicolov

Committer

facebook-github-bot

Parents

ee288313

pytorch e77abb9a - Normalize reward-to-go in C++ actor-critic (#33550)

pytorch
e77abb9a - Normalize reward-to-go in C++ actor-critic (#33550)