flax
Proximal Policy Optimization example
#470
Merged

Proximal Policy Optimization example #470

wrzadkow
wrzadkow Initial PPO commit
d38a6710
wrzadkow Use jax.nn.one_hot instead of list comprehension for speed
c0ff3efd
wrzadkow Clarity: calculate only advantages in gae_advantages()
f576a76a
wrzadkow jit-compile training step
11bc5938
wrzadkow Clarity: get rid of most [:-1] indexing
5feeec71
wrzadkow jit & vmap Generalized Advantage Estimation
8be6677c
wrzadkow Add advantage normalization
670978fa
wrzadkow Small code cleanup
34141008
wrzadkow Add some asserts & debug info logging
f40b0491
wrzadkow Add unit tests
2bd52d82
wrzadkow Add more debugging info
b943afc7
wrzadkow Add forward pass tests
b0543a95
wrzadkow Explicitly mention values shape being (batch,1), not (batch, ) (no in…
6eedf84e
wrzadkow Add more asserts, test more frequently
04763aac
wrzadkow Use log_probs from the start
be01451d
wrzadkow Thread sync: wait for experience before starting the training
a99baac6
wrzadkow Reduce amount of information printed when testing
c06e8d77
wrzadkow Clarity: use namedtuple instead of tuple
21a3540b
wrzadkow Add README
c18dd9dd
google-cla google-cla added cla: yes
wrzadkow Enhance docstrings
d9ad5be8
andsteing andsteing assigned jheek jheek 5 years ago
wrzadkow Allow more flexible game choice (don't hardcode game-pecific features)
d0ff2ae7
wrzadkow Correctly specify the number of frames
1af5bbbd
wrzadkow wrzadkow force pushed to 1af5bbbd 5 years ago
jheek
jheek requested changes on 2020-09-18
andsteing
andsteing commented on 2020-09-17
wrzadkow Add device_get() for speed as suggested by @jheek
f88e45b0
wrzadkow Add requirements.txt
690a9c89
wrzadkow Use absl.flags for better hyperparameter handling
58c4ca08
lespeholt
lespeholt commented on 2020-09-19
8bitmp3
8bitmp3 commented on 2020-09-19
wrzadkow Style improvement (comments by @lespeholt and @8bitmp3 & beyond)
f53c1df0
wrzadkow Don't bin rewards during testing
2b10c332
wrzadkow wrzadkow force pushed 5 years ago
wrzadkow wrzadkow force pushed 5 years ago
wrzadkow Update testing requirements
da0ec777
wrzadkow wrzadkow force pushed to da0ec777 5 years ago
wrzadkow Implement the decay of the clip parameter and learning rate
9c72f00e
wrzadkow Models: jnp.maximum->nn.relu and use dtype everywhere
f3986601
wrzadkow Append and then reverse instead of pushing in front in GAE estimation
19dbbc27
wrzadkow Unit & policy test improvements
518a7f61
wrzadkow wrzadkow force pushed 5 years ago
wrzadkow wrzadkow force pushed to 518a7f61 5 years ago
wrzadkow Fix conflict in setup.py
8ef44937
wrzadkow wrzadkow force pushed to 8ef44937 5 years ago
wrzadkow Add required packages to test requirements
e846aeff
wrzadkow Merge branch 'master' into rl-example-ppo
399e9b24
codecov-commenter
wrzadkow Cleanup of main.py incl. variable rename
7b02ec07
lespeholt
lespeholt commented on 2020-09-23
lespeholt
lespeholt commented on 2020-09-24
wrzadkow Streamline training: use one thread, divide code into smaller chunks
50b2b792
wrzadkow Avoid using global variables
df3daa19
wrzadkow wrzadkow force pushed 5 years ago
wrzadkow wrzadkow force pushed 5 years ago
wrzadkow wrzadkow force pushed 5 years ago
wrzadkow wrzadkow force pushed 5 years ago
wrzadkow Adhere to file naming standard
7e036ae5
wrzadkow wrzadkow force pushed to 7e036ae5 5 years ago
wrzadkow Merge remote.py with agent.py due to similar function
9ff33b97
wrzadkow Use tensorboard for logging and add checkpointing
08bd3449
wrzadkow wrzadkow force pushed to 08bd3449 5 years ago
wrzadkow Simplify and format code
65faed8d
wrzadkow Save checkpoints less frequently
68b87133
wrzadkow wrzadkow marked this pull request as ready for review 5 years ago
wrzadkow Update the README
57dd0a37
wrzadkow Don't send values and log probs to remote process and back
d7a8fa45
wrzadkow Add tensorboard.dev trace
f9e37fea
wrzadkow Remove unneeded function get_state()
70d21f71
wrzadkow Small type hints & docstrings enhancement
342786bf
wrzadkow Use ml_collections for hyperparameter handling
a4dade8c
wrzadkow wrzadkow force pushed to a4dade8c 5 years ago
wrzadkow Refactor a long statement
315902bd
jheek
jheek commented on 2020-09-21
wrzadkow Test: use assertEqual and clip rewards when testing them
d2eae5c0
wrzadkow Compile vectorized code instead of vectorizing compiled code
d444075e
wrzadkow Specify static_argnums with proper int
f3a9d03e
copybara-service copybara-service merged 45937af8 into master 5 years ago

Login to write a write a comment.

Login via GitHub

Assignees
Labels
Milestone