ding.rl_utils.adder¶
ding.rl_utils.adder
¶
Adder
¶
Bases: object
Overview
Adder is a component that handles different transformations and calculations for transitions in Collector Module(data generation and processing), such as GAE, n-step return, transition sampling etc.
Interface: init, get_gae, get_gae_with_default_last_value, get_nstep_return_data, get_train_sample
get_gae(data, last_value, gamma, gae_lambda, cuda)
classmethod
¶
Overview
Get GAE advantage for stacked transitions(T timestep, 1 batch). Call gae for calculation.
Arguments:
- data (:obj:list): Transitions list, each element is a transition dict with at least ['value', 'reward'].
- last_value (:obj:torch.Tensor): The last value(i.e.: the T+1 timestep)
- gamma (:obj:float): The future discount factor, should be in [0, 1], defaults to 0.99.
- gae_lambda (:obj:float): GAE lambda parameter, should be in [0, 1], defaults to 0.97, when lambda -> 0, it induces bias, but when lambda -> 1, it has high variance due to the sum of terms.
- cuda (:obj:bool): Whether use cuda in GAE computation
Returns:
- data (:obj:list): transitions list like input one, but each element owns extra advantage key 'adv'
Examples:
>>> B, T = 2, 3 # batch_size, timestep
>>> data = [dict(value=torch.randn(B), reward=torch.randn(B)) for _ in range(T)]
>>> last_value = torch.randn(B)
>>> gamma = 0.99
>>> gae_lambda = 0.95
>>> cuda = False
>>> data = Adder.get_gae(data, last_value, gamma, gae_lambda, cuda)
get_gae_with_default_last_value(data, done, gamma, gae_lambda, cuda)
classmethod
¶
Overview
Like get_gae above to get GAE advantage for stacked transitions. However, this function is designed in
case last_value is not passed. If transition is not done yet, it wouold assign last value in data
as last_value, discard the last element in data (i.e. len(data) would decrease by 1), and then call
get_gae. Otherwise it would make last_value equal to 0.
Arguments:
- data (:obj:deque): Transitions list, each element is a transition dict with at least['value', 'reward']
- done (:obj:bool): Whether the transition reaches the end of an episode(i.e. whether the env is done)
- gamma (:obj:float): The future discount factor, should be in [0, 1], defaults to 0.99.
- gae_lambda (:obj:float): GAE lambda parameter, should be in [0, 1], defaults to 0.97, when lambda -> 0, it induces bias, but when lambda -> 1, it has high variance due to the sum of terms.
- cuda (:obj:bool): Whether use cuda in GAE computation
Returns:
- data (:obj:List[Dict[str, Any]]): transitions list like input one, but each element owns extra advantage key 'adv'
Examples:
>>> B, T = 2, 3 # batch_size, timestep
>>> data = [dict(value=torch.randn(B), reward=torch.randn(B)) for _ in range(T)]
>>> done = False
>>> gamma = 0.99
>>> gae_lambda = 0.95
>>> cuda = False
>>> data = Adder.get_gae_with_default_last_value(data, done, gamma, gae_lambda, cuda)
get_nstep_return_data(data, nstep, cum_reward=False, correct_terminate_gamma=True, gamma=0.99)
classmethod
¶
Overview
Process raw traj data by updating keys ['next_obs', 'reward', 'done'] in data's dict element.
Arguments:
- data (:obj:deque): Transitions list, each element is a transition dict
- nstep (:obj:int): Number of steps. If equals to 1, return data directly; Otherwise update with nstep value.
Returns:
- data (:obj:deque): Transitions list like input one, but each element updated with nstep value.
Examples:
>>> data = [dict(
>>> obs=torch.randn(B),
>>> reward=torch.randn(1),
>>> next_obs=torch.randn(B),
>>> done=False) for _ in range(T)]
>>> nstep = 2
>>> data = Adder.get_nstep_return_data(data, nstep)
get_train_sample(data, unroll_len, last_fn_type='last', null_transition=None)
classmethod
¶
Overview
Process raw traj data by updating keys ['next_obs', 'reward', 'done'] in data's dict element.
If unroll_len equals to 1, which means no process is needed, can directly return data.
Otherwise, data will be splitted according to unroll_len, process residual part according to
last_fn_type and call lists_to_dicts to form sampled training data.
Arguments:
- data (:obj:List[Dict[str, Any]]): Transitions list, each element is a transition dict
- unroll_len (:obj:int): Learn training unroll length
- last_fn_type (:obj:str): The method type name for dealing with last residual data in a traj after splitting, should be in ['last', 'drop', 'null_padding']
- null_transition (:obj:Optional[dict]): Dict type null transition, used in null_padding
Returns:
- data (:obj:List[Dict[str, Any]]): Transitions list processed after unrolling
Full Source Code
../ding/rl_utils/adder.py