Proximal Policy Optimization example #470
Initial PPO commit
d38a6710
Use jax.nn.one_hot instead of list comprehension for speed
c0ff3efd
Clarity: calculate only advantages in gae_advantages()
f576a76a
jit-compile training step
11bc5938
Clarity: get rid of most [:-1] indexing
5feeec71
jit & vmap Generalized Advantage Estimation
8be6677c
Add advantage normalization
670978fa
Small code cleanup
34141008
Add some asserts & debug info logging
f40b0491
Add unit tests
2bd52d82
Add more debugging info
b943afc7
Add forward pass tests
b0543a95
Explicitly mention values shape being (batch,1), not (batch, ) (no in…
6eedf84e
Add more asserts, test more frequently
04763aac
Use log_probs from the start
be01451d
Thread sync: wait for experience before starting the training
a99baac6
Reduce amount of information printed when testing
c06e8d77
Clarity: use namedtuple instead of tuple
21a3540b
Add README
c18dd9dd
Enhance docstrings
d9ad5be8
Allow more flexible game choice (don't hardcode game-pecific features)
d0ff2ae7
Correctly specify the number of frames
1af5bbbd
wrzadkow
force pushed
to
1af5bbbd
5 years ago
jheek
requested changes
on 2020-09-18
Add device_get() for speed as suggested by @jheek
f88e45b0
Add requirements.txt
690a9c89
Use absl.flags for better hyperparameter handling
58c4ca08
Style improvement (comments by @lespeholt and @8bitmp3 & beyond)
f53c1df0
Don't bin rewards during testing
2b10c332
Update testing requirements
da0ec777
wrzadkow
force pushed
to
da0ec777
5 years ago
Implement the decay of the clip parameter and learning rate
9c72f00e
Models: jnp.maximum->nn.relu and use dtype everywhere
f3986601
Append and then reverse instead of pushing in front in GAE estimation
19dbbc27
Unit & policy test improvements
518a7f61
wrzadkow
force pushed
to
518a7f61
5 years ago
Fix conflict in setup.py
8ef44937
wrzadkow
force pushed
to
8ef44937
5 years ago
Add required packages to test requirements
e846aeff
Merge branch 'master' into rl-example-ppo
399e9b24
Cleanup of main.py incl. variable rename
7b02ec07
Streamline training: use one thread, divide code into smaller chunks
50b2b792
Avoid using global variables
df3daa19
Adhere to file naming standard
7e036ae5
wrzadkow
force pushed
to
7e036ae5
5 years ago
Merge remote.py with agent.py due to similar function
9ff33b97
Use tensorboard for logging and add checkpointing
08bd3449
wrzadkow
force pushed
to
08bd3449
5 years ago
Simplify and format code
65faed8d
Save checkpoints less frequently
68b87133
wrzadkow
marked this pull request as ready for review 5 years ago
Update the README
57dd0a37
Don't send values and log probs to remote process and back
d7a8fa45
Add tensorboard.dev trace
f9e37fea
Remove unneeded function get_state()
70d21f71
Small type hints & docstrings enhancement
342786bf
Use ml_collections for hyperparameter handling
a4dade8c
wrzadkow
force pushed
to
a4dade8c
5 years ago
Refactor a long statement
315902bd
jheek
commented
on 2020-09-21
Test: use assertEqual and clip rewards when testing them
d2eae5c0
Compile vectorized code instead of vectorizing compiled code
d444075e
Specify static_argnums with proper int
f3a9d03e
Login to write a write a comment.
Login via GitHub