ding.reward_model.drex_reward_model¶
ding.reward_model.drex_reward_model
¶
DrexRewardModel
¶
Bases: TrexRewardModel
Overview
The Drex reward model class (https://arxiv.org/pdf/1907.03976.pdf)
Interface:
estimate, train, load_expert_data, collect_data, clear_date, __init__, _train,
Config:
== ==================== ====== ============= ======================================= ===============
ID Symbol Type Default Value Description Other(Shape)
== ==================== ====== ============= ======================================= ===============
1 type str drex | Reward model register name, refer |
| to registry REWARD_MODEL_REGISTRY |
3 | learning_rate float 0.00001 | learning rate for optimizer |
4 | update_per_ int 100 | Number of updates per collect |
| collect | |
5 | batch_size int 64 | How many samples in a training batch |
6 | hidden_size int 128 | Linear model hidden size |
7 | num_trajs int 0 | Number of downsampled full |
| trajectories |
8 | num_snippets int 6000 | Number of short subtrajectories |
| to sample |
== ==================== ====== ============= ======================================= ================
__init__(config, device, tb_logger)
¶
Overview
Initialize self. See help(type(self)) for accurate signature.
Arguments:
- cfg (:obj:EasyDict): Training config
- device (:obj:str): Device usage, i.e. "cpu" or "cuda"
- tb_logger (:obj:SummaryWriter): Logger, defaultly set as 'SummaryWriter' for model summary
load_expert_data()
¶
Overview
Getting the expert data from config.expert_data_path attribute in self
Effects:
This is a side effect function which updates the expert data attribute (i.e. self.expert_data) with fn:concat_state_action_pairs
Full Source Code
../ding/reward_model/drex_reward_model.py