Skip to content

ding.model.template.qac_dist

ding.model.template.qac_dist

QACDIST

Bases: Module

Overview

The QAC model with distributional Q-value.

Interfaces: __init__, forward, compute_actor, compute_critic

__init__(obs_shape, action_shape, action_space='regression', critic_head_type='categorical', actor_head_hidden_size=64, actor_head_layer_num=1, critic_head_hidden_size=64, critic_head_layer_num=1, activation=nn.ReLU(), norm_type=None, v_min=-10, v_max=10, n_atom=51)

Overview

Init the QAC Distributional Model according to arguments.

Arguments: - obs_shape (:obj:Union[int, SequenceType]): Observation's space. - action_shape (:obj:Union[int, SequenceType]): Action's space. - action_space (:obj:str): Whether choose regression or reparameterization. - critic_head_type (:obj:str): Only categorical. - actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor-nn's Head. - actor_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for actor's nn. - critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic-nn's Head. - critic_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for critic's nn. - activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU() - norm_type (:obj:Optional[str]): The type of normalization to use, see ding.torch_utils.fc_block for more details. - v_min (:obj:int): Value of the smallest atom - v_max (:obj:int): Value of the largest atom - n_atom (:obj:int): Number of atoms in the support

forward(inputs, mode)

Overview

Use observation and action tensor to predict output. Parameter updates with QACDIST's MLPs forward setup.

Arguments: Forward with 'compute_actor': - inputs (:obj:torch.Tensor): The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size). Whether actor_head_hidden_size or critic_head_hidden_size depend on mode.

Forward with ``'compute_critic'``, inputs (`Dict`) Necessary Keys:
    - ``obs``, ``action`` encoded tensors.

- mode (:obj:`str`): Name of the forward mode.

Returns: - outputs (:obj:Dict): Outputs of network forward.

    Forward with ``'compute_actor'``, Necessary Keys (either):
        - action (:obj:`torch.Tensor`): Action tensor with same size as input ``x``.
        - logit (:obj:`torch.Tensor`):
            Logit tensor encoding ``mu`` and ``sigma``, both with same size as input ``x``.

    Forward with ``'compute_critic'``, Necessary Keys:
        - q_value (:obj:`torch.Tensor`): Q value tensor with same size as batch size.
        - distribution (:obj:`torch.Tensor`): Q value distribution tensor.

Actor Shapes: - inputs (:obj:torch.Tensor): :math:(B, N0), B is batch size and N0 corresponds to hidden_size - action (:obj:torch.Tensor): :math:(B, N0) - q_value (:obj:torch.FloatTensor): :math:(B, ), where B is batch size.

Critic Shapes
  • obs (:obj:torch.Tensor): :math:(B, N1), where B is batch size and N1 is obs_shape
  • action (:obj:torch.Tensor): :math:(B, N2), where B is batch size and N2 isaction_shape
  • q_value (:obj:torch.FloatTensor): :math:(B, N2), where B is batch size and N2 is action_shape
  • distribution (:obj:torch.FloatTensor): :math:(B, 1, N3), where B is batch size and N3 is num_atom
Actor Examples
Regression mode

model = QACDIST(64, 64, 'regression') inputs = torch.randn(4, 64) actor_outputs = model(inputs,'compute_actor') assert actor_outputs['action'].shape == torch.Size([4, 64])

Reparameterization Mode

model = QACDIST(64, 64, 'reparameterization') inputs = torch.randn(4, 64) actor_outputs = model(inputs,'compute_actor') actor_outputs['logit'][0].shape # mu torch.Size([4, 64]) actor_outputs['logit'][1].shape # sigma torch.Size([4, 64])

Critic Examples
Categorical mode

inputs = {'obs': torch.randn(4,N), 'action': torch.randn(4,1)} model = QACDIST(obs_shape=(N, ),action_shape=1,action_space='regression', ... critic_head_type='categorical', n_atoms=51) q_value = model(inputs, mode='compute_critic') # q value assert q_value['q_value'].shape == torch.Size([4, 1]) assert q_value['distribution'].shape == torch.Size([4, 1, 51])

compute_actor(inputs)

Overview

Use encoded embedding tensor to predict output. Execute parameter updates with 'compute_actor' mode Use encoded embedding tensor to predict output.

Arguments: - inputs (:obj:torch.Tensor): The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size). hidden_size = actor_head_hidden_size - mode (:obj:str): Name of the forward mode. Returns: - outputs (:obj:Dict): Outputs of forward pass encoder and head.

ReturnsKeys (either): - action (:obj:torch.Tensor): Continuous action tensor with same size as action_shape. - logit (:obj:torch.Tensor): Logit tensor encoding mu and sigma, both with same size as input x. Shapes: - inputs (:obj:torch.Tensor): :math:(B, N0), B is batch size and N0 corresponds to hidden_size - action (:obj:torch.Tensor): :math:(B, N0) - logit (:obj:list): 2 elements, mu and sigma, each is the shape of :math:(B, N0). - q_value (:obj:torch.FloatTensor): :math:(B, ), B is batch size. Examples: >>> # Regression mode >>> model = QACDIST(64, 64, 'regression') >>> inputs = torch.randn(4, 64) >>> actor_outputs = model(inputs,'compute_actor') >>> assert actor_outputs['action'].shape == torch.Size([4, 64]) >>> # Reparameterization Mode >>> model = QACDIST(64, 64, 'reparameterization') >>> inputs = torch.randn(4, 64) >>> actor_outputs = model(inputs,'compute_actor') >>> actor_outputs['logit'][0].shape # mu >>> torch.Size([4, 64]) >>> actor_outputs['logit'][1].shape # sigma >>> torch.Size([4, 64])

compute_critic(inputs)

Overview

Execute parameter updates with 'compute_critic' mode Use encoded embedding tensor to predict output.

Arguments: - obs, action encoded tensors. - mode (:obj:str): Name of the forward mode. Returns: - outputs (:obj:Dict): Q-value output and distribution.

ReturnKeys
  • q_value (:obj:torch.Tensor): Q value tensor with same size as batch size.
  • distribution (:obj:torch.Tensor): Q value distribution tensor.

Shapes: - obs (:obj:torch.Tensor): :math:(B, N1), where B is batch size and N1 is obs_shape - action (:obj:torch.Tensor): :math:(B, N2), where B is batch size and N2 isaction_shape - q_value (:obj:torch.FloatTensor): :math:(B, N2), where B is batch size and N2 is action_shape - distribution (:obj:torch.FloatTensor): :math:(B, 1, N3), where B is batch size and N3 is num_atom

Examples:

>>> # Categorical mode
>>> inputs = {'obs': torch.randn(4,N), 'action': torch.randn(4,1)}
>>> model = QACDIST(obs_shape=(N, ),action_shape=1,action_space='regression',             ...                 critic_head_type='categorical', n_atoms=51)
>>> q_value = model(inputs, mode='compute_critic') # q value
>>> assert q_value['q_value'].shape == torch.Size([4, 1])
>>> assert q_value['distribution'].shape == torch.Size([4, 1, 51])

Full Source Code

../ding/model/template/qac_dist.py

1from typing import Union, Dict, Optional 2import torch 3import torch.nn as nn 4 5from ding.utils import SequenceType, squeeze, MODEL_REGISTRY 6from ..common import RegressionHead, ReparameterizationHead, DistributionHead 7 8 9@MODEL_REGISTRY.register('qac_dist') 10class QACDIST(nn.Module): 11 """ 12 Overview: 13 The QAC model with distributional Q-value. 14 Interfaces: 15 ``__init__``, ``forward``, ``compute_actor``, ``compute_critic`` 16 """ 17 mode = ['compute_actor', 'compute_critic'] 18 19 def __init__( 20 self, 21 obs_shape: Union[int, SequenceType], 22 action_shape: Union[int, SequenceType], 23 action_space: str = "regression", 24 critic_head_type: str = "categorical", 25 actor_head_hidden_size: int = 64, 26 actor_head_layer_num: int = 1, 27 critic_head_hidden_size: int = 64, 28 critic_head_layer_num: int = 1, 29 activation: Optional[nn.Module] = nn.ReLU(), 30 norm_type: Optional[str] = None, 31 v_min: Optional[float] = -10, 32 v_max: Optional[float] = 10, 33 n_atom: Optional[int] = 51, 34 ) -> None: 35 """ 36 Overview: 37 Init the QAC Distributional Model according to arguments. 38 Arguments: 39 - obs_shape (:obj:`Union[int, SequenceType]`): Observation's space. 40 - action_shape (:obj:`Union[int, SequenceType]`): Action's space. 41 - action_space (:obj:`str`): Whether choose ``regression`` or ``reparameterization``. 42 - critic_head_type (:obj:`str`): Only ``categorical``. 43 - actor_head_hidden_size (:obj:`Optional[int]`): The ``hidden_size`` to pass to actor-nn's ``Head``. 44 - actor_head_layer_num (:obj:`int`): 45 The num of layers used in the network to compute Q value output for actor's nn. 46 - critic_head_hidden_size (:obj:`Optional[int]`): The ``hidden_size`` to pass to critic-nn's ``Head``. 47 - critic_head_layer_num (:obj:`int`): 48 The num of layers used in the network to compute Q value output for critic's nn. 49 - activation (:obj:`Optional[nn.Module]`): 50 The type of activation function to use in ``MLP`` the after ``layer_fn``, 51 if ``None`` then default set to ``nn.ReLU()`` 52 - norm_type (:obj:`Optional[str]`): 53 The type of normalization to use, see ``ding.torch_utils.fc_block`` for more details. 54 - v_min (:obj:`int`): Value of the smallest atom 55 - v_max (:obj:`int`): Value of the largest atom 56 - n_atom (:obj:`int`): Number of atoms in the support 57 """ 58 super(QACDIST, self).__init__() 59 obs_shape: int = squeeze(obs_shape) 60 action_shape: int = squeeze(action_shape) 61 self.action_space = action_space 62 assert self.action_space in ['regression', 'reparameterization'] 63 if self.action_space == 'regression': 64 self.actor = nn.Sequential( 65 nn.Linear(obs_shape, actor_head_hidden_size), activation, 66 RegressionHead( 67 actor_head_hidden_size, 68 action_shape, 69 actor_head_layer_num, 70 final_tanh=True, 71 activation=activation, 72 norm_type=norm_type 73 ) 74 ) 75 elif self.action_space == 'reparameterization': 76 self.actor = nn.Sequential( 77 nn.Linear(obs_shape, actor_head_hidden_size), activation, 78 ReparameterizationHead( 79 actor_head_hidden_size, 80 action_shape, 81 actor_head_layer_num, 82 sigma_type='conditioned', 83 activation=activation, 84 norm_type=norm_type 85 ) 86 ) 87 self.critic_head_type = critic_head_type 88 assert self.critic_head_type in ['categorical'], self.critic_head_type 89 if self.critic_head_type == 'categorical': 90 self.critic = nn.Sequential( 91 nn.Linear(obs_shape + action_shape, critic_head_hidden_size), activation, 92 DistributionHead( 93 critic_head_hidden_size, 94 1, 95 critic_head_layer_num, 96 n_atom=n_atom, 97 v_min=v_min, 98 v_max=v_max, 99 activation=activation, 100 norm_type=norm_type 101 ) 102 ) 103 104 def forward(self, inputs: Union[torch.Tensor, Dict], mode: str) -> Dict: 105 """ 106 Overview: 107 Use observation and action tensor to predict output. 108 Parameter updates with QACDIST's MLPs forward setup. 109 Arguments: 110 Forward with ``'compute_actor'``: 111 - inputs (:obj:`torch.Tensor`): 112 The encoded embedding tensor, determined with given ``hidden_size``, i.e. ``(B, N=hidden_size)``. 113 Whether ``actor_head_hidden_size`` or ``critic_head_hidden_size`` depend on ``mode``. 114 115 Forward with ``'compute_critic'``, inputs (`Dict`) Necessary Keys: 116 - ``obs``, ``action`` encoded tensors. 117 118 - mode (:obj:`str`): Name of the forward mode. 119 Returns: 120 - outputs (:obj:`Dict`): Outputs of network forward. 121 122 Forward with ``'compute_actor'``, Necessary Keys (either): 123 - action (:obj:`torch.Tensor`): Action tensor with same size as input ``x``. 124 - logit (:obj:`torch.Tensor`): 125 Logit tensor encoding ``mu`` and ``sigma``, both with same size as input ``x``. 126 127 Forward with ``'compute_critic'``, Necessary Keys: 128 - q_value (:obj:`torch.Tensor`): Q value tensor with same size as batch size. 129 - distribution (:obj:`torch.Tensor`): Q value distribution tensor. 130 Actor Shapes: 131 - inputs (:obj:`torch.Tensor`): :math:`(B, N0)`, B is batch size and N0 corresponds to ``hidden_size`` 132 - action (:obj:`torch.Tensor`): :math:`(B, N0)` 133 - q_value (:obj:`torch.FloatTensor`): :math:`(B, )`, where B is batch size. 134 135 Critic Shapes: 136 - obs (:obj:`torch.Tensor`): :math:`(B, N1)`, where B is batch size and N1 is ``obs_shape`` 137 - action (:obj:`torch.Tensor`): :math:`(B, N2)`, where B is batch size and N2 is``action_shape`` 138 - q_value (:obj:`torch.FloatTensor`): :math:`(B, N2)`, where B is batch size and N2 is ``action_shape`` 139 - distribution (:obj:`torch.FloatTensor`): :math:`(B, 1, N3)`, where B is batch size and N3 is ``num_atom`` 140 141 Actor Examples: 142 >>> # Regression mode 143 >>> model = QACDIST(64, 64, 'regression') 144 >>> inputs = torch.randn(4, 64) 145 >>> actor_outputs = model(inputs,'compute_actor') 146 >>> assert actor_outputs['action'].shape == torch.Size([4, 64]) 147 >>> # Reparameterization Mode 148 >>> model = QACDIST(64, 64, 'reparameterization') 149 >>> inputs = torch.randn(4, 64) 150 >>> actor_outputs = model(inputs,'compute_actor') 151 >>> actor_outputs['logit'][0].shape # mu 152 >>> torch.Size([4, 64]) 153 >>> actor_outputs['logit'][1].shape # sigma 154 >>> torch.Size([4, 64]) 155 156 Critic Examples: 157 >>> # Categorical mode 158 >>> inputs = {'obs': torch.randn(4,N), 'action': torch.randn(4,1)} 159 >>> model = QACDIST(obs_shape=(N, ),action_shape=1,action_space='regression', \ 160 ... critic_head_type='categorical', n_atoms=51) 161 >>> q_value = model(inputs, mode='compute_critic') # q value 162 >>> assert q_value['q_value'].shape == torch.Size([4, 1]) 163 >>> assert q_value['distribution'].shape == torch.Size([4, 1, 51]) 164 """ 165 assert mode in self.mode, "not support forward mode: {}/{}".format(mode, self.mode) 166 return getattr(self, mode)(inputs) 167 168 def compute_actor(self, inputs: torch.Tensor) -> Dict: 169 """ 170 Overview: 171 Use encoded embedding tensor to predict output. 172 Execute parameter updates with ``'compute_actor'`` mode 173 Use encoded embedding tensor to predict output. 174 Arguments: 175 - inputs (:obj:`torch.Tensor`): 176 The encoded embedding tensor, determined with given ``hidden_size``, i.e. ``(B, N=hidden_size)``. 177 ``hidden_size = actor_head_hidden_size`` 178 - mode (:obj:`str`): Name of the forward mode. 179 Returns: 180 - outputs (:obj:`Dict`): Outputs of forward pass encoder and head. 181 182 ReturnsKeys (either): 183 - action (:obj:`torch.Tensor`): Continuous action tensor with same size as ``action_shape``. 184 - logit (:obj:`torch.Tensor`): 185 Logit tensor encoding ``mu`` and ``sigma``, both with same size as input ``x``. 186 Shapes: 187 - inputs (:obj:`torch.Tensor`): :math:`(B, N0)`, B is batch size and N0 corresponds to ``hidden_size`` 188 - action (:obj:`torch.Tensor`): :math:`(B, N0)` 189 - logit (:obj:`list`): 2 elements, mu and sigma, each is the shape of :math:`(B, N0)`. 190 - q_value (:obj:`torch.FloatTensor`): :math:`(B, )`, B is batch size. 191 Examples: 192 >>> # Regression mode 193 >>> model = QACDIST(64, 64, 'regression') 194 >>> inputs = torch.randn(4, 64) 195 >>> actor_outputs = model(inputs,'compute_actor') 196 >>> assert actor_outputs['action'].shape == torch.Size([4, 64]) 197 >>> # Reparameterization Mode 198 >>> model = QACDIST(64, 64, 'reparameterization') 199 >>> inputs = torch.randn(4, 64) 200 >>> actor_outputs = model(inputs,'compute_actor') 201 >>> actor_outputs['logit'][0].shape # mu 202 >>> torch.Size([4, 64]) 203 >>> actor_outputs['logit'][1].shape # sigma 204 >>> torch.Size([4, 64]) 205 """ 206 x = self.actor(inputs) 207 if self.action_space == 'regression': 208 return {'action': x['pred']} 209 elif self.action_space == 'reparameterization': 210 return {'logit': [x['mu'], x['sigma']]} 211 212 def compute_critic(self, inputs: Dict) -> Dict: 213 """ 214 Overview: 215 Execute parameter updates with ``'compute_critic'`` mode 216 Use encoded embedding tensor to predict output. 217 Arguments: 218 - ``obs``, ``action`` encoded tensors. 219 - mode (:obj:`str`): Name of the forward mode. 220 Returns: 221 - outputs (:obj:`Dict`): Q-value output and distribution. 222 223 ReturnKeys: 224 - q_value (:obj:`torch.Tensor`): Q value tensor with same size as batch size. 225 - distribution (:obj:`torch.Tensor`): Q value distribution tensor. 226 Shapes: 227 - obs (:obj:`torch.Tensor`): :math:`(B, N1)`, where B is batch size and N1 is ``obs_shape`` 228 - action (:obj:`torch.Tensor`): :math:`(B, N2)`, where B is batch size and N2 is``action_shape`` 229 - q_value (:obj:`torch.FloatTensor`): :math:`(B, N2)`, where B is batch size and N2 is ``action_shape`` 230 - distribution (:obj:`torch.FloatTensor`): :math:`(B, 1, N3)`, where B is batch size and N3 is ``num_atom`` 231 232 Examples: 233 >>> # Categorical mode 234 >>> inputs = {'obs': torch.randn(4,N), 'action': torch.randn(4,1)} 235 >>> model = QACDIST(obs_shape=(N, ),action_shape=1,action_space='regression', \ 236 ... critic_head_type='categorical', n_atoms=51) 237 >>> q_value = model(inputs, mode='compute_critic') # q value 238 >>> assert q_value['q_value'].shape == torch.Size([4, 1]) 239 >>> assert q_value['distribution'].shape == torch.Size([4, 1, 51]) 240 """ 241 obs, action = inputs['obs'], inputs['action'] 242 assert len(obs.shape) == 2 243 if len(action.shape) == 1: # (B, ) -> (B, 1) 244 action = action.unsqueeze(1) 245 x = torch.cat([obs, action], dim=1) 246 x = self.critic(x) 247 return {'q_value': x['logit'], 'distribution': x['distribution']}