ding.rl_utils.gae¶
ding.rl_utils.gae
¶
shape_fn_gae(args, kwargs)
¶
Overview
Return shape of gae for hpc
Returns: shape: [T, B]
gae(data, gamma=0.99, lambda_=0.97)
¶
Overview
Implementation of Generalized Advantage Estimator (arXiv:1506.02438)
Arguments:
- data (:obj:namedtuple): gae input data with fields ['value', 'reward'], which contains some episodes or trajectories data.
- gamma (:obj:float): the future discount factor, should be in [0, 1], defaults to 0.99.
- lambda (:obj:float): the gae parameter lambda, should be in [0, 1], defaults to 0.97, when lambda -> 0, it induces bias, but when lambda -> 1, it has high variance due to the sum of terms.
Returns:
- adv (:obj:torch.FloatTensor): the calculated advantage
Shapes:
- value (:obj:torch.FloatTensor): :math:(T, B), where T is trajectory length and B is batch size
- next_value (:obj:torch.FloatTensor): :math:(T, B)
- reward (:obj:torch.FloatTensor): :math:(T, B)
- adv (:obj:torch.FloatTensor): :math:(T, B)
Examples:
>>> value = torch.randn(2, 3)
>>> next_value = torch.randn(2, 3)
>>> reward = torch.randn(2, 3)
>>> data = gae_data(value, next_value, reward, None, None)
>>> adv = gae(data)
Full Source Code
../ding/rl_utils/gae.py