ding.reward_model.base_reward_model¶
ding.reward_model.base_reward_model
¶
BaseRewardModel
¶
Bases: ABC
Overview
the base class of reward model
Interface:
default_config, estimate, train, clear_data, collect_data, load_expert_date
estimate(data)
abstractmethod
¶
Overview
estimate reward
Arguments:
- data (:obj:List): the list of data used for estimation
Returns / Effects:
- This can be a side effect function which updates the reward value
- If this function returns, an example returned object can be reward (:obj:Any): the estimated reward
train(data)
abstractmethod
¶
Overview
Training the reward model
Arguments:
- data (:obj:Any): Data used for training
Effects:
- This is mostly a side effect function which updates the reward model
collect_data(data)
abstractmethod
¶
Overview
Collecting training data in designated formate or with designated transition.
Arguments:
- data (:obj:Any): Raw training data (e.g. some form of states, actions, obs, etc)
Returns / Effects:
- This can be a side effect function which updates the data attribute in self
clear_data()
abstractmethod
¶
Overview
Clearing training data. This can be a side effect function which clears the data attribute in self
load_expert_data(data)
¶
Overview
Getting the expert data, usually used in inverse RL reward model
Arguments:
- data (:obj:Any): Expert data
Effects:
This is mostly a side effect function which updates the expert data attribute (e.g. self.expert_data)
reward_deepcopy(train_data)
¶
Overview
this method deepcopy reward part in train_data, and other parts keep shallow copy to avoid the reward part of train_data in the replay buffer be incorrectly modified.
Arguments:
- train_data (:obj:List): the List of train data in which the reward part will be operated by deepcopy.
create_reward_model(cfg, device, tb_logger)
¶
Overview
Reward Estimation Model.
Arguments:
- cfg (:obj:Dict): Training config
- device (:obj:str): Device usage, i.e. "cpu" or "cuda"
- tb_logger (:obj:str): Logger, defaultly set as 'SummaryWriter' for model summary
Returns:
- reward (:obj:Any): The reward model
Full Source Code
../ding/reward_model/base_reward_model.py