ding.rl_utils.ppg¶
ding.rl_utils.ppg
¶
ppg_joint_error(data, clip_ratio=0.2, use_value_clip=True)
¶
Overview
Get PPG joint loss
Arguments:
- data (:obj:namedtuple): ppg input data with fieids shown in ppg_data
- clip_ratio (:obj:float): clip value for ratio
- use_value_clip (:obj:bool): whether use value clip
Returns:
- ppg_joint_loss (:obj:namedtuple): the ppg loss item, all of them are the differentiable 0-dim tensor
Shapes:
- logit_new (:obj:torch.FloatTensor): :math:(B, N), where B is batch size and N is action dim
- logit_old (:obj:torch.FloatTensor): :math:(B, N)
- action (:obj:torch.LongTensor): :math:(B,)
- value_new (:obj:torch.FloatTensor): :math:(B, 1)
- value_old (:obj:torch.FloatTensor): :math:(B, 1)
- return (:obj:torch.FloatTensor): :math:(B, 1)
- weight (:obj:torch.FloatTensor): :math:(B,)
- auxiliary_loss (:obj:torch.FloatTensor): :math:(), 0-dim tensor
- behavioral_cloning_loss (:obj:torch.FloatTensor): :math:()
Examples:
>>> action_dim = 4
>>> data = ppg_data(
>>> logit_new=torch.randn(3, action_dim),
>>> logit_old=torch.randn(3, action_dim),
>>> action=torch.randint(0, action_dim, (3,)),
>>> value_new=torch.randn(3, 1),
>>> value_old=torch.randn(3, 1),
>>> return_=torch.randn(3, 1),
>>> weight=torch.ones(3),
>>> )
>>> loss = ppg_joint_error(data, 0.99, 0.99)
Full Source Code
../ding/rl_utils/ppg.py