Skip to content

ding.model.template.bc

ding.model.template.bc

DiscreteBC

Bases: Module

Overview

The DiscreteBC network.

Interfaces: __init__, forward

__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], dueling=True, head_hidden_size=None, head_layer_num=1, activation=nn.ReLU(), norm_type=None, strides=None)

Overview

Init the DiscreteBC (encoder + head) Model according to input arguments.

Arguments: - obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84]. - action_shape (:obj:Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3]. - encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size. - dueling (:obj:dueling): Whether choose DuelingHead or DiscreteHead(default). - head_hidden_size (:obj:Optional[int]): The hidden_size of head network. - head_layer_num (:obj:int): The number of layers used in the head network to compute Q value output - activation (:obj:Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU(). - norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details. - strides (:obj:Optional[list]): The strides for each convolution layers, such as [2, 2, 2]. The length of this argument should be the same as encoder_hidden_size_list.

forward(x)

Overview

DiscreteBC forward computation graph, input observation tensor to predict q_value.

Arguments: - x (:obj:torch.Tensor): Observation inputs Returns: - outputs (:obj:Dict): DiscreteBC forward outputs, such as q_value. ReturnsKeys: - logit (:obj:torch.Tensor): Discrete Q-value output of each action dimension. Shapes: - x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is obs_shape - logit (:obj:torch.FloatTensor): :math:(B, M), where B is batch size and M is action_shape Examples: >>> model = DiscreteBC(32, 6) # arguments: 'obs_shape' and 'action_shape' >>> inputs = torch.randn(4, 32) >>> outputs = model(inputs) >>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 6])

ContinuousBC

Bases: Module

Overview

The ContinuousBC network.

Interfaces: __init__, forward

__init__(obs_shape, action_shape, action_space, actor_head_hidden_size=64, actor_head_layer_num=1, activation=nn.ReLU(), norm_type=None)

Overview

Initialize the ContinuousBC Model according to input arguments.

Arguments: - obs_shape (:obj:Union[int, SequenceType]): Observation's shape, such as 128, (156, ). - action_shape (:obj:Union[int, SequenceType, EasyDict]): Action's shape, such as 4, (3, ), EasyDict({'action_type_shape': 3, 'action_args_shape': 4}). - action_space (:obj:str): The type of action space, including [regression, reparameterization]. - actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor head. - actor_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for actor head. - activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP after each FC layer, if None then default set to nn.ReLU(). - norm_type (:obj:Optional[str]): The type of normalization to after network layer (FC, Conv), see ding.torch_utils.network for more details.

forward(inputs)

Overview

The unique execution (forward) method of ContinuousBC.

Arguments: - inputs (:obj:torch.Tensor): Observation data, defaults to tensor. Returns: - output (:obj:Dict): Output dict data, including different key-values among distinct action_space. ReturnsKeys: - action (:obj:torch.Tensor): action output of actor network, with shape :math:(B, action_shape). - logit (:obj:List[torch.Tensor]): reparameterized action output of actor network, with shape :math:(B, action_shape). Shapes: - inputs (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is obs_shape - action (:obj:torch.FloatTensor): :math:(B, M), where B is batch size and M is action_shape - logit (:obj:List[torch.FloatTensor]): :math:(B, M), where B is batch size and M is action_shape Examples (Regression): >>> model = ContinuousBC(32, 6, action_space='regression') >>> inputs = torch.randn(4, 32) >>> outputs = model(inputs) >>> assert isinstance(outputs, dict) and outputs['action'].shape == torch.Size([4, 6]) Examples (Reparameterization): >>> model = ContinuousBC(32, 6, action_space='reparameterization') >>> inputs = torch.randn(4, 32) >>> outputs = model(inputs) >>> assert isinstance(outputs, dict) and outputs['logit'][0].shape == torch.Size([4, 6]) >>> assert outputs['logit'][1].shape == torch.Size([4, 6])

Full Source Code

../ding/model/template/bc.py

1from typing import Union, Optional, Dict 2import torch 3import torch.nn as nn 4from easydict import EasyDict 5 6from ding.utils import MODEL_REGISTRY, SequenceType, squeeze 7from ..common import FCEncoder, ConvEncoder, DiscreteHead, DuelingHead, \ 8 MultiHead, RegressionHead, ReparameterizationHead 9 10 11@MODEL_REGISTRY.register('discrete_bc') 12class DiscreteBC(nn.Module): 13 """ 14 Overview: 15 The DiscreteBC network. 16 Interfaces: 17 ``__init__``, ``forward`` 18 """ 19 20 def __init__( 21 self, 22 obs_shape: Union[int, SequenceType], 23 action_shape: Union[int, SequenceType], 24 encoder_hidden_size_list: SequenceType = [128, 128, 64], 25 dueling: bool = True, 26 head_hidden_size: Optional[int] = None, 27 head_layer_num: int = 1, 28 activation: Optional[nn.Module] = nn.ReLU(), 29 norm_type: Optional[str] = None, 30 strides: Optional[list] = None, 31 ) -> None: 32 """ 33 Overview: 34 Init the DiscreteBC (encoder + head) Model according to input arguments. 35 Arguments: 36 - obs_shape (:obj:`Union[int, SequenceType]`): Observation space shape, such as 8 or [4, 84, 84]. 37 - action_shape (:obj:`Union[int, SequenceType]`): Action space shape, such as 6 or [2, 3, 3]. 38 - encoder_hidden_size_list (:obj:`SequenceType`): Collection of ``hidden_size`` to pass to ``Encoder``, \ 39 the last element must match ``head_hidden_size``. 40 - dueling (:obj:`dueling`): Whether choose ``DuelingHead`` or ``DiscreteHead(default)``. 41 - head_hidden_size (:obj:`Optional[int]`): The ``hidden_size`` of head network. 42 - head_layer_num (:obj:`int`): The number of layers used in the head network to compute Q value output 43 - activation (:obj:`Optional[nn.Module]`): The type of activation function in networks \ 44 if ``None`` then default set it to ``nn.ReLU()``. 45 - norm_type (:obj:`Optional[str]`): The type of normalization in networks, see \ 46 ``ding.torch_utils.fc_block`` for more details. 47 - strides (:obj:`Optional[list]`): The strides for each convolution layers, such as [2, 2, 2]. The length \ 48 of this argument should be the same as ``encoder_hidden_size_list``. 49 """ 50 super(DiscreteBC, self).__init__() 51 # For compatibility: 1, (1, ), [4, 32, 32] 52 obs_shape, action_shape = squeeze(obs_shape), squeeze(action_shape) 53 if head_hidden_size is None: 54 head_hidden_size = encoder_hidden_size_list[-1] 55 # FC Encoder 56 if isinstance(obs_shape, int) or len(obs_shape) == 1: 57 self.encoder = FCEncoder(obs_shape, encoder_hidden_size_list, activation=activation, norm_type=norm_type) 58 # Conv Encoder 59 elif len(obs_shape) == 3: 60 if not strides: 61 self.encoder = ConvEncoder( 62 obs_shape, encoder_hidden_size_list, activation=activation, norm_type=norm_type 63 ) 64 else: 65 self.encoder = ConvEncoder( 66 obs_shape, encoder_hidden_size_list, activation=activation, norm_type=norm_type, stride=strides 67 ) 68 else: 69 raise RuntimeError( 70 "not support obs_shape for pre-defined encoder: {}, please customize your own BC".format(obs_shape) 71 ) 72 # Head Type 73 if dueling: 74 head_cls = DuelingHead 75 else: 76 head_cls = DiscreteHead 77 multi_head = not isinstance(action_shape, int) 78 if multi_head: 79 self.head = MultiHead( 80 head_cls, 81 head_hidden_size, 82 action_shape, 83 layer_num=head_layer_num, 84 activation=activation, 85 norm_type=norm_type 86 ) 87 else: 88 self.head = head_cls( 89 head_hidden_size, action_shape, head_layer_num, activation=activation, norm_type=norm_type 90 ) 91 92 def forward(self, x: torch.Tensor) -> Dict: 93 """ 94 Overview: 95 DiscreteBC forward computation graph, input observation tensor to predict q_value. 96 Arguments: 97 - x (:obj:`torch.Tensor`): Observation inputs 98 Returns: 99 - outputs (:obj:`Dict`): DiscreteBC forward outputs, such as q_value. 100 ReturnsKeys: 101 - logit (:obj:`torch.Tensor`): Discrete Q-value output of each action dimension. 102 Shapes: 103 - x (:obj:`torch.Tensor`): :math:`(B, N)`, where B is batch size and N is ``obs_shape`` 104 - logit (:obj:`torch.FloatTensor`): :math:`(B, M)`, where B is batch size and M is ``action_shape`` 105 Examples: 106 >>> model = DiscreteBC(32, 6) # arguments: 'obs_shape' and 'action_shape' 107 >>> inputs = torch.randn(4, 32) 108 >>> outputs = model(inputs) 109 >>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 6]) 110 """ 111 x = self.encoder(x) 112 x = self.head(x) 113 return x 114 115 116@MODEL_REGISTRY.register('continuous_bc') 117class ContinuousBC(nn.Module): 118 """ 119 Overview: 120 The ContinuousBC network. 121 Interfaces: 122 ``__init__``, ``forward`` 123 """ 124 125 def __init__( 126 self, 127 obs_shape: Union[int, SequenceType], 128 action_shape: Union[int, SequenceType, EasyDict], 129 action_space: str, 130 actor_head_hidden_size: int = 64, 131 actor_head_layer_num: int = 1, 132 activation: Optional[nn.Module] = nn.ReLU(), 133 norm_type: Optional[str] = None, 134 ) -> None: 135 """ 136 Overview: 137 Initialize the ContinuousBC Model according to input arguments. 138 Arguments: 139 - obs_shape (:obj:`Union[int, SequenceType]`): Observation's shape, such as 128, (156, ). 140 - action_shape (:obj:`Union[int, SequenceType, EasyDict]`): Action's shape, such as 4, (3, ), \ 141 EasyDict({'action_type_shape': 3, 'action_args_shape': 4}). 142 - action_space (:obj:`str`): The type of action space, \ 143 including [``regression``, ``reparameterization``]. 144 - actor_head_hidden_size (:obj:`Optional[int]`): The ``hidden_size`` to pass to actor head. 145 - actor_head_layer_num (:obj:`int`): The num of layers used in the network to compute Q value output \ 146 for actor head. 147 - activation (:obj:`Optional[nn.Module]`): The type of activation function to use in ``MLP`` \ 148 after each FC layer, if ``None`` then default set to ``nn.ReLU()``. 149 - norm_type (:obj:`Optional[str]`): The type of normalization to after network layer (FC, Conv), \ 150 see ``ding.torch_utils.network`` for more details. 151 """ 152 super(ContinuousBC, self).__init__() 153 obs_shape: int = squeeze(obs_shape) 154 action_shape = squeeze(action_shape) 155 self.action_shape = action_shape 156 self.action_space = action_space 157 assert self.action_space in ['regression', 'reparameterization'] 158 if self.action_space == 'regression': 159 self.actor = nn.Sequential( 160 nn.Linear(obs_shape, actor_head_hidden_size), activation, 161 RegressionHead( 162 actor_head_hidden_size, 163 action_shape, 164 actor_head_layer_num, 165 final_tanh=True, 166 activation=activation, 167 norm_type=norm_type 168 ) 169 ) 170 elif self.action_space == 'reparameterization': 171 self.actor = nn.Sequential( 172 nn.Linear(obs_shape, actor_head_hidden_size), activation, 173 ReparameterizationHead( 174 actor_head_hidden_size, 175 action_shape, 176 actor_head_layer_num, 177 sigma_type='conditioned', 178 activation=activation, 179 norm_type=norm_type 180 ) 181 ) 182 183 def forward(self, inputs: Union[torch.Tensor, Dict[str, torch.Tensor]]) -> Dict: 184 """ 185 Overview: 186 The unique execution (forward) method of ContinuousBC. 187 Arguments: 188 - inputs (:obj:`torch.Tensor`): Observation data, defaults to tensor. 189 Returns: 190 - output (:obj:`Dict`): Output dict data, including different key-values among distinct action_space. 191 ReturnsKeys: 192 - action (:obj:`torch.Tensor`): action output of actor network, \ 193 with shape :math:`(B, action_shape)`. 194 - logit (:obj:`List[torch.Tensor]`): reparameterized action output of actor network, \ 195 with shape :math:`(B, action_shape)`. 196 Shapes: 197 - inputs (:obj:`torch.Tensor`): :math:`(B, N)`, where B is batch size and N is ``obs_shape`` 198 - action (:obj:`torch.FloatTensor`): :math:`(B, M)`, where B is batch size and M is ``action_shape`` 199 - logit (:obj:`List[torch.FloatTensor]`): :math:`(B, M)`, where B is batch size and M is ``action_shape`` 200 Examples (Regression): 201 >>> model = ContinuousBC(32, 6, action_space='regression') 202 >>> inputs = torch.randn(4, 32) 203 >>> outputs = model(inputs) 204 >>> assert isinstance(outputs, dict) and outputs['action'].shape == torch.Size([4, 6]) 205 Examples (Reparameterization): 206 >>> model = ContinuousBC(32, 6, action_space='reparameterization') 207 >>> inputs = torch.randn(4, 32) 208 >>> outputs = model(inputs) 209 >>> assert isinstance(outputs, dict) and outputs['logit'][0].shape == torch.Size([4, 6]) 210 >>> assert outputs['logit'][1].shape == torch.Size([4, 6]) 211 """ 212 if self.action_space == 'regression': 213 x = self.actor(inputs) 214 return {'action': x['pred']} 215 elif self.action_space == 'reparameterization': 216 x = self.actor(inputs) 217 return {'logit': [x['mu'], x['sigma']]}