Skip to content

ding.model.template.procedure_cloning

ding.model.template.procedure_cloning

PCTransformer

Bases: Module

Overview

The transformer block for neural network of algorithms related to Procedure cloning (PC).

Interfaces: __init__, forward.

__init__(cnn_hidden, att_hidden, att_heads, drop_p, max_T, n_att, feedforward_hidden, n_feedforward)

Overview

Initialize the procedure cloning transformer model according to corresponding input arguments.

Arguments: - cnn_hidden (:obj:int): The last channel dimension of CNN encoder, such as 32. - att_hidden (:obj:int): The dimension of attention blocks, such as 32. - att_heads (:obj:int): The number of heads in attention blocks, such as 4. - drop_p (:obj:float): The drop out rate of attention, such as 0.5. - max_T (:obj:int): The sequence length of procedure cloning, such as 4. - n_attn (:obj:int): The number of attention layers, such as 4. - feedforward_hidden (:obj:int):The dimension of feedforward layers, such as 32. - n_feedforward (:obj:int): The number of feedforward layers, such as 4.

forward(x)

Overview

The unique execution (forward) method of PCTransformer.

Arguments: - x (:obj:torch.Tensor): Sequential data of several hidden states. Returns: - output (:obj:torch.Tensor): A tensor with the same shape as the input. Examples: >>> model = PCTransformer(128, 128, 8, 0, 16, 2, 128, 2) >>> h = torch.randn((2, 16, 128)) >>> h = model(h) >>> assert h.shape == torch.Size([2, 16, 128])

ProcedureCloningMCTS

Bases: Module

Overview

The neural network of algorithms related to Procedure cloning (PC).

Interfaces: __init__, forward.

__init__(obs_shape, action_dim, cnn_hidden_list=[128, 128, 256, 256, 256], cnn_activation=nn.ReLU(), cnn_kernel_size=[3, 3, 3, 3, 3], cnn_stride=[1, 1, 1, 1, 1], cnn_padding=[1, 1, 1, 1, 1], mlp_hidden_list=[256, 256], mlp_activation=nn.ReLU(), att_heads=8, att_hidden=128, n_att=4, n_feedforward=2, feedforward_hidden=256, drop_p=0.5, max_T=17)

Overview

Initialize the MCTS procedure cloning model according to corresponding input arguments.

Arguments: - obs_shape (:obj:SequenceType): Observation space shape, such as [4, 84, 84]. - action_dim (:obj:int): Action space shape, such as 6. - cnn_hidden_list (:obj:SequenceType): The cnn channel dims for each block, such as [128, 128, 256, 256, 256]. - cnn_activation (:obj:nn.Module): The activation function for cnn blocks, such as nn.ReLU(). - cnn_kernel_size (:obj:SequenceType): The kernel size for each cnn block, such as [3, 3, 3, 3, 3]. - cnn_stride (:obj:SequenceType): The stride for each cnn block, such as [1, 1, 1, 1, 1]. - cnn_padding (:obj:SequenceType): The padding for each cnn block, such as [1, 1, 1, 1, 1]. - mlp_hidden_list (:obj:SequenceType): The last dim for this must match the last dim of cnn_hidden_list, such as [256, 256]. - mlp_activation (:obj:nn.Module): The activation function for mlp layers, such as nn.ReLU(). - att_heads (:obj:int): The number of attention heads in transformer, such as 8. - att_hidden (:obj:int): The number of attention dimension in transformer, such as 128. - n_att (:obj:int): The number of attention blocks in transformer, such as 4. - n_feedforward (:obj:int): The number of feedforward layers in transformer, such as 2. - drop_p (:obj:float): The drop out rate of attention, such as 0.5. - max_T (:obj:int): The sequence length of procedure cloning, such as 17.

forward(states, goals, actions)

Overview

ProcedureCloningMCTS forward computation graph, input states tensor and goals tensor, calculate the predicted states and actions.

Arguments: - states (:obj:torch.Tensor): The observation of current time. - goals (:obj:torch.Tensor): The target observation after a period. - actions (:obj:torch.Tensor): The actions executed during the period. Returns: - outputs (:obj:Tuple[torch.Tensor, torch.Tensor]): Predicted states and actions. Examples: >>> inputs = { 'states': torch.randn(2, 3, 64, 64), 'goals': torch.randn(2, 3, 64, 64), 'actions': torch.randn(2, 15, 9) } >>> model = ProcedureCloningMCTS(obs_shape=(3, 64, 64), action_dim=9) >>> goal_preds, action_preds = model(inputs['states'], inputs['goals'], inputs['actions']) >>> assert goal_preds.shape == (2, 256) >>> assert action_preds.shape == (2, 16, 9)

BFSConvEncoder

Bases: Module

Overview

The BFSConvolution Encoder used to encode raw 3-dim observations. And output a feature map with the

same height and width as input. Interfaces: __init__, forward.

__init__(obs_shape, hidden_size_list=[32, 64, 64, 128], activation=nn.ReLU(), kernel_size=[8, 4, 3], stride=[4, 2, 1], padding=None)

Overview

Init the BFSConvolution Encoder according to the provided arguments.

Arguments: - obs_shape (:obj:SequenceType): Sequence of in_channel, plus one or more input size. - hidden_size_list (:obj:SequenceType): Sequence of hidden_size of subsequent conv layers and the final dense layer. - activation (:obj:nn.Module): Type of activation to use in the conv layers and ResBlock. Default is nn.ReLU(). - kernel_size (:obj:SequenceType): Sequence of kernel_size of subsequent conv layers. - stride (:obj:SequenceType): Sequence of stride of subsequent conv layers. - padding (:obj:SequenceType): Padding added to all four sides of the input for each conv layer. See nn.Conv2d for more details. Default is None.

forward(x)

Overview

Return output tensor of the env observation.

Arguments: - x (:obj:torch.Tensor): Env raw observation. Returns: - outputs (:obj:torch.Tensor): Output embedding tensor. Examples: >>> model = BFSConvEncoder([3, 16, 16], [32, 32, 4], kernel_size=[3, 3, 3], stride=[1, 1, 1] , padding=[1, 1, 1]) >>> inputs = torch.randn(3, 16, 16).unsqueeze(0) >>> outputs = model(inputs) >>> assert outputs['logit'].shape == torch.Size([4, 16, 16])

ProcedureCloningBFS

Bases: Module

Overview

The neural network introduced in procedure cloning (PC) to process 3-dim observations. Given an input, this model will perform several 3x3 convolutions and output a feature map with the same height and width of input. The channel number of output will be the action_shape.

Interfaces: __init__, forward.

__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 256, 256])

Overview

Init the BFSConvolution Encoder according to the provided arguments.

Arguments: - obs_shape (:obj:SequenceType): Sequence of in_channel, plus one or more input size, such as [4, 84, 84]. - action_dim (:obj:int): Action space shape, such as 6. - cnn_hidden_list (:obj:SequenceType): The cnn channel dims for each block, such as [128, 128, 256, 256].

forward(x)

Overview

The computation graph. Given a 3-dim observation, this function will return a tensor with the same height and width. The channel number of output will be the action_shape.

Arguments: - x (:obj:torch.Tensor): The input observation tensor data. Returns: - outputs (:obj:Dict): The output dict of model's forward computation graph, only contains a single key logit. Examples: >>> model = ProcedureCloningBFS([3, 16, 16], 4) >>> inputs = torch.randn(16, 16, 3).unsqueeze(0) >>> outputs = model(inputs) >>> assert outputs['logit'].shape == torch.Size([16, 16, 4])

Full Source Code

../ding/model/template/procedure_cloning.py

1from typing import Optional, Tuple, Union, Dict 2 3import torch 4import torch.nn as nn 5 6from ding.utils import MODEL_REGISTRY, SequenceType 7from ding.torch_utils.network.transformer import Attention 8from ding.torch_utils.network.nn_module import fc_block, build_normalization 9from ..common import FCEncoder, ConvEncoder 10 11 12class PCTransformer(nn.Module): 13 """ 14 Overview: 15 The transformer block for neural network of algorithms related to Procedure cloning (PC). 16 Interfaces: 17 ``__init__``, ``forward``. 18 """ 19 20 def __init__( 21 self, cnn_hidden: int, att_hidden: int, att_heads: int, drop_p: float, max_T: int, n_att: int, 22 feedforward_hidden: int, n_feedforward: int 23 ) -> None: 24 """ 25 Overview: 26 Initialize the procedure cloning transformer model according to corresponding input arguments. 27 Arguments: 28 - cnn_hidden (:obj:`int`): The last channel dimension of CNN encoder, such as 32. 29 - att_hidden (:obj:`int`): The dimension of attention blocks, such as 32. 30 - att_heads (:obj:`int`): The number of heads in attention blocks, such as 4. 31 - drop_p (:obj:`float`): The drop out rate of attention, such as 0.5. 32 - max_T (:obj:`int`): The sequence length of procedure cloning, such as 4. 33 - n_attn (:obj:`int`): The number of attention layers, such as 4. 34 - feedforward_hidden (:obj:`int`):The dimension of feedforward layers, such as 32. 35 - n_feedforward (:obj:`int`): The number of feedforward layers, such as 4. 36 """ 37 super().__init__() 38 self.n_att = n_att 39 self.n_feedforward = n_feedforward 40 self.attention_layer = [] 41 42 self.norm_layer = [nn.LayerNorm(att_hidden)] * n_att 43 self.attention_layer.append(Attention(cnn_hidden, att_hidden, att_hidden, att_heads, nn.Dropout(drop_p))) 44 for i in range(n_att - 1): 45 self.attention_layer.append(Attention(att_hidden, att_hidden, att_hidden, att_heads, nn.Dropout(drop_p))) 46 47 self.att_drop = nn.Dropout(drop_p) 48 49 self.fc_blocks = [] 50 self.fc_blocks.append(fc_block(att_hidden, feedforward_hidden, activation=nn.ReLU())) 51 for i in range(n_feedforward - 1): 52 self.fc_blocks.append(fc_block(feedforward_hidden, feedforward_hidden, activation=nn.ReLU())) 53 self.norm_layer.extend([nn.LayerNorm(feedforward_hidden)] * n_feedforward) 54 self.mask = torch.tril(torch.ones((max_T, max_T), dtype=torch.bool)).view(1, 1, max_T, max_T) 55 56 def forward(self, x: torch.Tensor) -> torch.Tensor: 57 """ 58 Overview: 59 The unique execution (forward) method of PCTransformer. 60 Arguments: 61 - x (:obj:`torch.Tensor`): Sequential data of several hidden states. 62 Returns: 63 - output (:obj:`torch.Tensor`): A tensor with the same shape as the input. 64 Examples: 65 >>> model = PCTransformer(128, 128, 8, 0, 16, 2, 128, 2) 66 >>> h = torch.randn((2, 16, 128)) 67 >>> h = model(h) 68 >>> assert h.shape == torch.Size([2, 16, 128]) 69 """ 70 for i in range(self.n_att): 71 x = self.att_drop(self.attention_layer[i](x, self.mask)) 72 x = self.norm_layer[i](x) 73 for i in range(self.n_feedforward): 74 x = self.fc_blocks[i](x) 75 x = self.norm_layer[i + self.n_att](x) 76 return x 77 78 79@MODEL_REGISTRY.register('pc_mcts') 80class ProcedureCloningMCTS(nn.Module): 81 """ 82 Overview: 83 The neural network of algorithms related to Procedure cloning (PC). 84 Interfaces: 85 ``__init__``, ``forward``. 86 """ 87 88 def __init__( 89 self, 90 obs_shape: SequenceType, 91 action_dim: int, 92 cnn_hidden_list: SequenceType = [128, 128, 256, 256, 256], 93 cnn_activation: nn.Module = nn.ReLU(), 94 cnn_kernel_size: SequenceType = [3, 3, 3, 3, 3], 95 cnn_stride: SequenceType = [1, 1, 1, 1, 1], 96 cnn_padding: SequenceType = [1, 1, 1, 1, 1], 97 mlp_hidden_list: SequenceType = [256, 256], 98 mlp_activation: nn.Module = nn.ReLU(), 99 att_heads: int = 8, 100 att_hidden: int = 128, 101 n_att: int = 4, 102 n_feedforward: int = 2, 103 feedforward_hidden: int = 256, 104 drop_p: float = 0.5, 105 max_T: int = 17 106 ) -> None: 107 """ 108 Overview: 109 Initialize the MCTS procedure cloning model according to corresponding input arguments. 110 Arguments: 111 - obs_shape (:obj:`SequenceType`): Observation space shape, such as [4, 84, 84]. 112 - action_dim (:obj:`int`): Action space shape, such as 6. 113 - cnn_hidden_list (:obj:`SequenceType`): The cnn channel dims for each block, such as\ 114 [128, 128, 256, 256, 256]. 115 - cnn_activation (:obj:`nn.Module`): The activation function for cnn blocks, such as ``nn.ReLU()``. 116 - cnn_kernel_size (:obj:`SequenceType`): The kernel size for each cnn block, such as [3, 3, 3, 3, 3]. 117 - cnn_stride (:obj:`SequenceType`): The stride for each cnn block, such as [1, 1, 1, 1, 1]. 118 - cnn_padding (:obj:`SequenceType`): The padding for each cnn block, such as [1, 1, 1, 1, 1]. 119 - mlp_hidden_list (:obj:`SequenceType`): The last dim for this must match the last dim of \ 120 ``cnn_hidden_list``, such as [256, 256]. 121 - mlp_activation (:obj:`nn.Module`): The activation function for mlp layers, such as ``nn.ReLU()``. 122 - att_heads (:obj:`int`): The number of attention heads in transformer, such as 8. 123 - att_hidden (:obj:`int`): The number of attention dimension in transformer, such as 128. 124 - n_att (:obj:`int`): The number of attention blocks in transformer, such as 4. 125 - n_feedforward (:obj:`int`): The number of feedforward layers in transformer, such as 2. 126 - drop_p (:obj:`float`): The drop out rate of attention, such as 0.5. 127 - max_T (:obj:`int`): The sequence length of procedure cloning, such as 17. 128 """ 129 super().__init__() 130 131 # Conv Encoder 132 self.embed_state = ConvEncoder( 133 obs_shape, cnn_hidden_list, cnn_activation, cnn_kernel_size, cnn_stride, cnn_padding 134 ) 135 self.embed_action = FCEncoder(action_dim, mlp_hidden_list, activation=mlp_activation) 136 137 self.cnn_hidden_list = cnn_hidden_list 138 139 assert cnn_hidden_list[-1] == mlp_hidden_list[-1] 140 layers = [] 141 for i in range(n_att): 142 if i == 0: 143 layers.append(Attention(cnn_hidden_list[-1], att_hidden, att_hidden, att_heads, nn.Dropout(drop_p))) 144 else: 145 layers.append(Attention(att_hidden, att_hidden, att_hidden, att_heads, nn.Dropout(drop_p))) 146 layers.append(build_normalization('LN')(att_hidden)) 147 for i in range(n_feedforward): 148 if i == 0: 149 layers.append(fc_block(att_hidden, feedforward_hidden, activation=nn.ReLU())) 150 else: 151 layers.append(fc_block(feedforward_hidden, feedforward_hidden, activation=nn.ReLU())) 152 self.layernorm2 = build_normalization('LN')(feedforward_hidden) 153 154 self.transformer = PCTransformer( 155 cnn_hidden_list[-1], att_hidden, att_heads, drop_p, max_T, n_att, feedforward_hidden, n_feedforward 156 ) 157 158 self.predict_goal = torch.nn.Linear(cnn_hidden_list[-1], cnn_hidden_list[-1]) 159 self.predict_action = torch.nn.Linear(cnn_hidden_list[-1], action_dim) 160 161 def forward(self, states: torch.Tensor, goals: torch.Tensor, 162 actions: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]: 163 """ 164 Overview: 165 ProcedureCloningMCTS forward computation graph, input states tensor and goals tensor, \ 166 calculate the predicted states and actions. 167 Arguments: 168 - states (:obj:`torch.Tensor`): The observation of current time. 169 - goals (:obj:`torch.Tensor`): The target observation after a period. 170 - actions (:obj:`torch.Tensor`): The actions executed during the period. 171 Returns: 172 - outputs (:obj:`Tuple[torch.Tensor, torch.Tensor]`): Predicted states and actions. 173 Examples: 174 >>> inputs = { \ 175 'states': torch.randn(2, 3, 64, 64), \ 176 'goals': torch.randn(2, 3, 64, 64), \ 177 'actions': torch.randn(2, 15, 9) \ 178 } 179 >>> model = ProcedureCloningMCTS(obs_shape=(3, 64, 64), action_dim=9) 180 >>> goal_preds, action_preds = model(inputs['states'], inputs['goals'], inputs['actions']) 181 >>> assert goal_preds.shape == (2, 256) 182 >>> assert action_preds.shape == (2, 16, 9) 183 """ 184 B, T, _ = actions.shape 185 186 # shape: (B, h_dim) 187 state_embeddings = self.embed_state(states).reshape(B, 1, self.cnn_hidden_list[-1]) 188 goal_embeddings = self.embed_state(goals).reshape(B, 1, self.cnn_hidden_list[-1]) 189 # shape: (B, context_len, h_dim) 190 actions_embeddings = self.embed_action(actions) 191 192 h = torch.cat((state_embeddings, goal_embeddings, actions_embeddings), dim=1) 193 h = self.transformer(h) 194 h = h.reshape(B, T + 2, self.cnn_hidden_list[-1]) 195 196 goal_preds = self.predict_goal(h[:, 0, :]) 197 action_preds = self.predict_action(h[:, 1:, :]) 198 199 return goal_preds, action_preds 200 201 202class BFSConvEncoder(nn.Module): 203 """ 204 Overview: 205 The ``BFSConvolution Encoder`` used to encode raw 3-dim observations. And output a feature map with the 206 same height and width as input. Interfaces: ``__init__``, ``forward``. 207 """ 208 209 def __init__( 210 self, 211 obs_shape: SequenceType, 212 hidden_size_list: SequenceType = [32, 64, 64, 128], 213 activation: Optional[nn.Module] = nn.ReLU(), 214 kernel_size: SequenceType = [8, 4, 3], 215 stride: SequenceType = [4, 2, 1], 216 padding: Optional[SequenceType] = None, 217 ) -> None: 218 """ 219 Overview: 220 Init the ``BFSConvolution Encoder`` according to the provided arguments. 221 Arguments: 222 - obs_shape (:obj:`SequenceType`): Sequence of ``in_channel``, plus one or more ``input size``. 223 - hidden_size_list (:obj:`SequenceType`): Sequence of ``hidden_size`` of subsequent conv layers \ 224 and the final dense layer. 225 - activation (:obj:`nn.Module`): Type of activation to use in the conv ``layers`` and ``ResBlock``. \ 226 Default is ``nn.ReLU()``. 227 - kernel_size (:obj:`SequenceType`): Sequence of ``kernel_size`` of subsequent conv layers. 228 - stride (:obj:`SequenceType`): Sequence of ``stride`` of subsequent conv layers. 229 - padding (:obj:`SequenceType`): Padding added to all four sides of the input for each conv layer. \ 230 See ``nn.Conv2d`` for more details. Default is ``None``. 231 """ 232 super(BFSConvEncoder, self).__init__() 233 self.obs_shape = obs_shape 234 self.act = activation 235 self.hidden_size_list = hidden_size_list 236 if padding is None: 237 padding = [0 for _ in range(len(kernel_size))] 238 239 layers = [] 240 input_size = obs_shape[0] # in_channel 241 for i in range(len(kernel_size)): 242 layers.append(nn.Conv2d(input_size, hidden_size_list[i], kernel_size[i], stride[i], padding[i])) 243 layers.append(self.act) 244 input_size = hidden_size_list[i] 245 layers = layers[:-1] 246 self.main = nn.Sequential(*layers) 247 248 def forward(self, x: torch.Tensor) -> torch.Tensor: 249 """ 250 Overview: 251 Return output tensor of the env observation. 252 Arguments: 253 - x (:obj:`torch.Tensor`): Env raw observation. 254 Returns: 255 - outputs (:obj:`torch.Tensor`): Output embedding tensor. 256 Examples: 257 >>> model = BFSConvEncoder([3, 16, 16], [32, 32, 4], kernel_size=[3, 3, 3], stride=[1, 1, 1]\ 258 , padding=[1, 1, 1]) 259 >>> inputs = torch.randn(3, 16, 16).unsqueeze(0) 260 >>> outputs = model(inputs) 261 >>> assert outputs['logit'].shape == torch.Size([4, 16, 16]) 262 """ 263 return self.main(x) 264 265 266@MODEL_REGISTRY.register('pc_bfs') 267class ProcedureCloningBFS(nn.Module): 268 """ 269 Overview: 270 The neural network introduced in procedure cloning (PC) to process 3-dim observations.\ 271 Given an input, this model will perform several 3x3 convolutions and output a feature map with \ 272 the same height and width of input. The channel number of output will be the ``action_shape``. 273 Interfaces: 274 ``__init__``, ``forward``. 275 """ 276 277 def __init__( 278 self, 279 obs_shape: SequenceType, 280 action_shape: int, 281 encoder_hidden_size_list: SequenceType = [128, 128, 256, 256], 282 ): 283 """ 284 Overview: 285 Init the ``BFSConvolution Encoder`` according to the provided arguments. 286 Arguments: 287 - obs_shape (:obj:`SequenceType`): Sequence of ``in_channel``, plus one or more ``input size``,\ 288 such as [4, 84, 84]. 289 - action_dim (:obj:`int`): Action space shape, such as 6. 290 - cnn_hidden_list (:obj:`SequenceType`): The cnn channel dims for each block, such as [128, 128, 256, 256]. 291 """ 292 super().__init__() 293 num_layers = len(encoder_hidden_size_list) 294 295 kernel_sizes = (3, ) * (num_layers + 1) 296 stride_sizes = (1, ) * (num_layers + 1) 297 padding_sizes = (1, ) * (num_layers + 1) 298 # The output channel equals to action_shape + 1 299 encoder_hidden_size_list.append(action_shape + 1) 300 301 self._encoder = BFSConvEncoder( 302 obs_shape=obs_shape, 303 hidden_size_list=encoder_hidden_size_list, 304 kernel_size=kernel_sizes, 305 stride=stride_sizes, 306 padding=padding_sizes, 307 ) 308 309 def forward(self, x: torch.Tensor) -> Dict: 310 """ 311 Overview: 312 The computation graph. Given a 3-dim observation, this function will return a tensor with the same \ 313 height and width. The channel number of output will be the ``action_shape``. 314 Arguments: 315 - x (:obj:`torch.Tensor`): The input observation tensor data. 316 Returns: 317 - outputs (:obj:`Dict`): The output dict of model's forward computation graph, \ 318 only contains a single key ``logit``. 319 Examples: 320 >>> model = ProcedureCloningBFS([3, 16, 16], 4) 321 >>> inputs = torch.randn(16, 16, 3).unsqueeze(0) 322 >>> outputs = model(inputs) 323 >>> assert outputs['logit'].shape == torch.Size([16, 16, 4]) 324 """ 325 x = x.permute(0, 3, 1, 2) 326 x = self._encoder(x) 327 return {'logit': x.permute(0, 2, 3, 1)}