ding.config.example.DDPG.gym_lunarlandercontinuous_v2¶
ding.config.example.DDPG.gym_lunarlandercontinuous_v2
¶
Full Source Code
../ding/config/example/DDPG/gym_lunarlandercontinuous_v2.py
1from easydict import EasyDict 2from functools import partial 3import ding.envs.gym_env 4 5cfg = dict( 6 exp_name='LunarLanderContinuous-V2-DDPG', 7 seed=0, 8 env=dict( 9 env_id='LunarLanderContinuous-v2', 10 collector_env_num=8, 11 evaluator_env_num=8, 12 n_evaluator_episode=8, 13 stop_value=260, 14 act_scale=True, 15 ), 16 policy=dict( 17 cuda=True, 18 random_collect_size=0, 19 model=dict( 20 obs_shape=8, 21 action_shape=2, 22 twin_critic=True, 23 action_space='regression', 24 ), 25 learn=dict( 26 update_per_collect=2, 27 batch_size=128, 28 learning_rate_actor=0.001, 29 learning_rate_critic=0.001, 30 ignore_done=False, # TODO(pu) 31 # (int) When critic network updates once, how many times will actor network update. 32 # Delayed Policy Updates in original TD3 paper(https://arxiv.org/pdf/1802.09477.pdf). 33 # Default 1 for DDPG, 2 for TD3. 34 actor_update_freq=1, 35 # (bool) Whether to add noise on target network's action. 36 # Target Policy Smoothing Regularization in original TD3 paper(https://arxiv.org/pdf/1802.09477.pdf). 37 # Default True for TD3, False for DDPG. 38 noise=False, 39 noise_sigma=0.1, 40 noise_range=dict( 41 min=-0.5, 42 max=0.5, 43 ), 44 ), 45 collect=dict( 46 n_sample=48, 47 noise_sigma=0.1, 48 collector=dict(collect_print_freq=1000, ), 49 ), 50 eval=dict(evaluator=dict(eval_freq=100, ), ), 51 other=dict(replay_buffer=dict(replay_buffer_size=20000, ), ), 52 ), 53 wandb_logger=dict( 54 gradient_logger=True, video_logger=True, plot_logger=True, action_logger=True, return_logger=False 55 ), 56) 57 58cfg = EasyDict(cfg) 59 60env = partial(ding.envs.gym_env.env, continuous=True)