PR #470 Proximal Policy Optimization example

Proximal Policy Optimization example #470

copybara-service merged 53 commits into google:master from wrzadkow:rl-example-ppo

Initial PPO commit

d38a6710

Use jax.nn.one_hot instead of list comprehension for speed

c0ff3efd

Clarity: calculate only advantages in gae_advantages()

f576a76a

jit-compile training step

11bc5938

Clarity: get rid of most [:-1] indexing

5feeec71

jit & vmap Generalized Advantage Estimation

8be6677c

Add advantage normalization

670978fa

Small code cleanup

34141008

Add some asserts & debug info logging

f40b0491

Add unit tests

2bd52d82

Add more debugging info

b943afc7

Add forward pass tests

b0543a95

Explicitly mention values shape being (batch,1), not (batch, ) (no in…

6eedf84e

Add more asserts, test more frequently

04763aac

Use log_probs from the start

be01451d

Thread sync: wait for experience before starting the training

a99baac6

Reduce amount of information printed when testing

c06e8d77

Clarity: use namedtuple instead of tuple

21a3540b

Add README

c18dd9dd

google-cla added cla: yes

Enhance docstrings

d9ad5be8

andsteing assigned

jheek 5 years ago

Allow more flexible game choice (don't hardcode game-pecific features)

d0ff2ae7

Correctly specify the number of frames

1af5bbbd

wrzadkow force pushed to 1af5bbbd 5 years ago

jheek requested changes on 2020-09-18

andsteing commented on 2020-09-17

Add device_get() for speed as suggested by @jheek

f88e45b0

Add requirements.txt

690a9c89

Use absl.flags for better hyperparameter handling

58c4ca08

lespeholt commented on 2020-09-19

8bitmp3 commented on 2020-09-19

Style improvement (comments by @lespeholt and @8bitmp3 & beyond)

f53c1df0

Don't bin rewards during testing

2b10c332

wrzadkow force pushed 5 years ago

Update testing requirements

da0ec777

wrzadkow force pushed to da0ec777 5 years ago

Implement the decay of the clip parameter and learning rate

9c72f00e

Models: jnp.maximum->nn.relu and use dtype everywhere

f3986601

Append and then reverse instead of pushing in front in GAE estimation

19dbbc27

Unit & policy test improvements

518a7f61

wrzadkow force pushed 5 years ago

wrzadkow force pushed to 518a7f61 5 years ago

Fix conflict in setup.py

8ef44937

wrzadkow force pushed to 8ef44937 5 years ago

Add required packages to test requirements

e846aeff

Merge branch 'master' into rl-example-ppo

399e9b24

Cleanup of main.py incl. variable rename

7b02ec07

lespeholt commented on 2020-09-23

lespeholt commented on 2020-09-24

Streamline training: use one thread, divide code into smaller chunks

50b2b792

Avoid using global variables

df3daa19

wrzadkow force pushed 5 years ago

Adhere to file naming standard

7e036ae5

wrzadkow force pushed to 7e036ae5 5 years ago

Merge remote.py with agent.py due to similar function

9ff33b97

Use tensorboard for logging and add checkpointing

08bd3449

wrzadkow force pushed to 08bd3449 5 years ago

Simplify and format code

65faed8d

Save checkpoints less frequently

68b87133

wrzadkow marked this pull request as ready for review 5 years ago

Update the README

57dd0a37

Don't send values and log probs to remote process and back

d7a8fa45

Add tensorboard.dev trace

f9e37fea

Remove unneeded function get_state()

70d21f71

Small type hints & docstrings enhancement

342786bf

Use ml_collections for hyperparameter handling

a4dade8c

wrzadkow force pushed to a4dade8c 5 years ago

Refactor a long statement

315902bd

jheek commented on 2020-09-21

Test: use assertEqual and clip rewards when testing them

d2eae5c0

Compile vectorized code instead of vectorizing compiled code

d444075e

Specify static_argnums with proper int

f3a9d03e

copybara-service merged 45937af8 into master 5 years ago

Reviewers

jheek

lespeholt

8bitmp3

andsteing

Assignees

jheek

Labels

cla: yes

Milestone

No milestone

flax Proximal Policy Optimization example #470 Merged

Proximal Policy Optimization example #470

flax
Proximal Policy Optimization example
#470

Merged