Skip to content

ding.model.template.pg

ding.model.template.pg

PG

Bases: Module

Overview

The neural network and computation graph of algorithms related to Policy Gradient(PG) (https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf). The PG model is composed of two parts: encoder and head. Encoders are used to extract the feature from various observation. Heads are used to predict corresponding action logit.

Interface: __init__, forward.

__init__(obs_shape, action_shape, action_space='discrete', encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, head_layer_num=1, activation=nn.ReLU(), norm_type=None)

Overview

Initialize the PG model according to corresponding input arguments.

Arguments: - obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84]. - action_shape (:obj:Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3]. - action_space (:obj:str): The type of different action spaces, including ['discrete', 'continuous'], then will instantiate corresponding head, including DiscreteHead and ReparameterizationHead. - encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size. - head_hidden_size (:obj:Optional[int]): The hidden_size of head network, defaults to None, it must match the last element of encoder_hidden_size_list. - head_layer_num (:obj:int): The num of layers used in the head network to compute action. - activation (:obj:Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU(). - norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details. you can choose one of ['BN', 'IN', 'SyncBN', 'LN'] Examples: >>> model = PG((4, 84, 84), 5) >>> inputs = torch.randn(8, 4, 84, 84) >>> outputs = model(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['logit'].shape == (8, 5) >>> assert outputs['dist'].sample().shape == (8, )

forward(x)

Overview

PG forward computation graph, input observation tensor to predict policy distribution.

Arguments: - x (:obj:torch.Tensor): The input observation tensor data. Returns: - outputs (:obj:torch.distributions): The output policy distribution. If action space is discrete, the output is Categorical distribution; if action space is continuous, the output is Normal distribution.

Full Source Code

../ding/model/template/pg.py

1from typing import Union, Optional, Dict, Callable, List 2import torch 3import torch.nn as nn 4from easydict import EasyDict 5 6from ding.torch_utils import get_lstm 7from ding.utils import MODEL_REGISTRY, SequenceType, squeeze 8from ..common import FCEncoder, ConvEncoder, DiscreteHead, DuelingHead, \ 9 MultiHead, RegressionHead, ReparameterizationHead, independent_normal_dist 10 11 12@MODEL_REGISTRY.register('pg') 13class PG(nn.Module): 14 """ 15 Overview: 16 The neural network and computation graph of algorithms related to Policy Gradient(PG) \ 17 (https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf). \ 18 The PG model is composed of two parts: encoder and head. Encoders are used to extract the feature \ 19 from various observation. Heads are used to predict corresponding action logit. 20 Interface: 21 ``__init__``, ``forward``. 22 """ 23 24 def __init__( 25 self, 26 obs_shape: Union[int, SequenceType], 27 action_shape: Union[int, SequenceType], 28 action_space: str = 'discrete', 29 encoder_hidden_size_list: SequenceType = [128, 128, 64], 30 head_hidden_size: Optional[int] = None, 31 head_layer_num: int = 1, 32 activation: Optional[nn.Module] = nn.ReLU(), 33 norm_type: Optional[str] = None 34 ) -> None: 35 """ 36 Overview: 37 Initialize the PG model according to corresponding input arguments. 38 Arguments: 39 - obs_shape (:obj:`Union[int, SequenceType]`): Observation space shape, such as 8 or [4, 84, 84]. 40 - action_shape (:obj:`Union[int, SequenceType]`): Action space shape, such as 6 or [2, 3, 3]. 41 - action_space (:obj:`str`): The type of different action spaces, including ['discrete', 'continuous'], \ 42 then will instantiate corresponding head, including ``DiscreteHead`` and ``ReparameterizationHead``. 43 - encoder_hidden_size_list (:obj:`SequenceType`): Collection of ``hidden_size`` to pass to ``Encoder``, \ 44 the last element must match ``head_hidden_size``. 45 - head_hidden_size (:obj:`Optional[int]`): The ``hidden_size`` of ``head`` network, defaults \ 46 to None, it must match the last element of ``encoder_hidden_size_list``. 47 - head_layer_num (:obj:`int`): The num of layers used in the ``head`` network to compute action. 48 - activation (:obj:`Optional[nn.Module]`): The type of activation function in networks \ 49 if ``None`` then default set it to ``nn.ReLU()``. 50 - norm_type (:obj:`Optional[str]`): The type of normalization in networks, see \ 51 ``ding.torch_utils.fc_block`` for more details. you can choose one of ['BN', 'IN', 'SyncBN', 'LN'] 52 Examples: 53 >>> model = PG((4, 84, 84), 5) 54 >>> inputs = torch.randn(8, 4, 84, 84) 55 >>> outputs = model(inputs) 56 >>> assert isinstance(outputs, dict) 57 >>> assert outputs['logit'].shape == (8, 5) 58 >>> assert outputs['dist'].sample().shape == (8, ) 59 """ 60 super(PG, self).__init__() 61 # For compatibility: 1, (1, ), [4, 32, 32] 62 obs_shape, action_shape = squeeze(obs_shape), squeeze(action_shape) 63 if head_hidden_size is None: 64 head_hidden_size = encoder_hidden_size_list[-1] 65 # FC Encoder 66 if isinstance(obs_shape, int) or len(obs_shape) == 1: 67 self.encoder = FCEncoder(obs_shape, encoder_hidden_size_list, activation=activation, norm_type=norm_type) 68 # Conv Encoder 69 elif len(obs_shape) == 3: 70 self.encoder = ConvEncoder(obs_shape, encoder_hidden_size_list, activation=activation, norm_type=norm_type) 71 else: 72 raise RuntimeError( 73 "not support obs_shape for pre-defined encoder: {}, please customize your own BC".format(obs_shape) 74 ) 75 self.action_space = action_space 76 # Head 77 if self.action_space == 'discrete': 78 self.head = DiscreteHead( 79 head_hidden_size, action_shape, head_layer_num, activation=activation, norm_type=norm_type 80 ) 81 elif self.action_space == 'continuous': 82 self.head = ReparameterizationHead( 83 head_hidden_size, 84 action_shape, 85 head_layer_num, 86 activation=activation, 87 norm_type=norm_type, 88 sigma_type='independent' 89 ) 90 else: 91 raise KeyError("not support action space: {}".format(self.action_space)) 92 93 def forward(self, x: torch.Tensor) -> Dict: 94 """ 95 Overview: 96 PG forward computation graph, input observation tensor to predict policy distribution. 97 Arguments: 98 - x (:obj:`torch.Tensor`): The input observation tensor data. 99 Returns: 100 - outputs (:obj:`torch.distributions`): The output policy distribution. If action space is \ 101 discrete, the output is Categorical distribution; if action space is continuous, the output is Normal \ 102 distribution. 103 """ 104 x = self.encoder(x) 105 x = self.head(x) 106 if self.action_space == 'discrete': 107 x['dist'] = torch.distributions.Categorical(logits=x['logit']) 108 elif self.action_space == 'continuous': 109 x = {'logit': {'mu': x['mu'], 'sigma': x['sigma']}} 110 x['dist'] = independent_normal_dist(x['logit']) 111 return x