ding.policy.td3¶
ding.policy.td3
¶
TD3Policy
¶
Bases: DDPGPolicy
Overview
Policy class of TD3 algorithm. Since DDPG and TD3 share many common things, we can easily derive this TD3 class from DDPG class by changing _actor_update_freq, _twin_critic and noise in model wrapper.
Paper link: https://arxiv.org/pdf/1802.09477.pdf
Config:
== ==================== ======== ================== ================================= =======================
ID Symbol Type Default Value Description Other(Shape)
== ==================== ======== ================== ================================= =======================
1 | type str td3 | RL policy register name, refer | this arg is optional,
| | to registry POLICY_REGISTRY | a placeholder
2 | cuda bool False | Whether to use cuda for network |
3 | random_ int 25000 | Number of randomly collected | Default to 25000 for
| collect_size | training samples in replay | DDPG/TD3, 10000 for
| | buffer when training starts. | sac.
4 | model.twin_ bool True | Whether to use two critic | Default True for TD3,
| critic | networks or only one. | Clipped Double
| | | Q-learning method in
| | | TD3 paper.
5 | learn.learning float 1e-3 | Learning rate for actor |
| _rate_actor | network(aka. policy). |
6 | learn.learning float 1e-3 | Learning rates for critic |
| _rate_critic | network (aka. Q-network). |
7 | learn.actor_ int 2 | When critic network updates | Default 2 for TD3, 1
| update_freq | once, how many times will actor | for DDPG. Delayed
| | network update. | Policy Updates method
| | | in TD3 paper.
8 | learn.noise bool True | Whether to add noise on target | Default True for TD3,
| | network's action. | False for DDPG.
| | | Target Policy Smoo-
| | | thing Regularization
| | | in TD3 paper.
9 | learn.noise_ dict | dict(min=-0.5, | Limit for range of target |
| range | max=0.5,) | policy smoothing noise, |
| | | aka. noise_clip. |
10 | learn.- bool False | Determine whether to ignore | Use ignore_done only
| ignore_done | done flag. | in halfcheetah env.
11 | learn.- float 0.005 | Used for soft update of the | aka. Interpolation
| target_theta | target network. | factor in polyak aver
| | | -aging for target
| | | networks.
12 | collect.- float 0.1 | Used for add noise during co- | Sample noise from dis
| noise_sigma | llection, through controlling | -tribution, Ornstein-
| | the sigma of distribution | Uhlenbeck process in
| | | DDPG paper, Gaussian
| | | process in ours.
== ==================== ======== ================== ================================= =======================
Full Source Code
../ding/policy/td3.py