A problem in PPOAgent

Hi, Rokas:

First of all thanks for your great tutorial on reinforcement learning, I went through all the series and learned a lot. 

In the PPOAgent I think there may be something wrong with [this line](https://github.com/pythonlessons/Reinforcement_Learning/blob/b5eedc73b946614c7a21634de9734dba961b6c91/11_Pong-v0_PPO/Pong-v0_PPO_TF2.py#L125). When I `vstack` the discounted_r (shape of (n,1)) and subtract it with predicted values (shape of (n,)), the advantages become shape of (n,n). So I think maybe we should not `vstack` discounted_r, but `vstack` the advantages [in this line](https://github.com/pythonlessons/Reinforcement_Learning/blob/b5eedc73b946614c7a21634de9734dba961b6c91/11_Pong-v0_PPO/Pong-v0_PPO_TF2.py#L130) `advantages = np.vstack(discounted_r - values)`, then the advantages are shape of (n,1), which is the expected result.

Thanks.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A problem in PPOAgent #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

A problem in PPOAgent #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions