ding.reward_model.her_reward_model¶
ding.reward_model.her_reward_model
¶
HerRewardModel
¶
Overview
Hindsight Experience Replay model.
.. note::
- her_strategy (:obj:str): Type of strategy that HER uses, should be in ['final', 'future', 'episode']
- her_replay_k (:obj:int): Number of new episodes generated by an original episode. (Not used in episodic HER)
- episode_size (:obj:int): Sample how many episodes in one iteration.
- sample_per_episode (:obj:int): How many new samples are generated from an episode.
.. note::
In HER, we require episode trajectory to change the goals. However, episode lengths are different
and may have high variance. As a result, we recommend that you only use some transitions in
the complete episode by specifying episode_size and sample_per_episode in config.
Therefore, in one iteration, batch_size would be episode_size * sample_per_episode.
estimate(episode, merge_func=None, split_func=None, goal_reward_func=None)
¶
Overview
Get HER processed episodes from original episodes.
Arguments:
- episode (:obj:List[Dict[str, Any]]): Episode list, each element is a transition.
- merge_func (:obj:Callable): The merge function to use, default set to None. If None, then use __her_default_merge_func
- split_func (:obj:Callable): The split function to use, default set to None. If None, then use __her_default_split_func
- goal_reward_func (:obj:Callable): The goal_reward function to use, default set to None. If None, then use __her_default_goal_reward_func
Returns:
- new_episode (:obj:List[Dict[str, Any]]): the processed transitions
__her_default_merge_func(x, y)
staticmethod
¶
Overview
The function to merge obs in HER timestep
Arguments:
- x (:obj:Any): one of the timestep obs to merge
- y (:obj:Any): another timestep obs to merge
Returns:
- ret (:obj:Any): the merge obs
__her_default_split_func(x)
staticmethod
¶
Overview
Split the input into obs, desired goal, and achieved goal.
Arguments:
- x (:obj:Any): The input to split
Returns:
- obs (:obj:torch.Tensor): Original obs.
- desired_goal (:obj:torch.Tensor): The final goal that wants to desired_goal
- achieved_goal (:obj:torch.Tensor): the achieved_goal
__her_default_goal_reward_func(achieved_goal, desired_goal)
staticmethod
¶
Overview
Get the corresponding merge reward according to whether the achieved_goal fit the desired_goal
Arguments:
- achieved_goal (:obj:torch.Tensor): the achieved goal
- desired_goal (:obj:torch.Tensor): the desired_goal
Returns:
- goal_reward (:obj:torch.Tensor): the goal reward according to \
whether the achieved_goal fit the disired_goal
Full Source Code
../ding/reward_model/her_reward_model.py