ding.model.template.bc¶
ding.model.template.bc
¶
DiscreteBC
¶
Bases: Module
Overview
The DiscreteBC network.
Interfaces:
__init__, forward
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], dueling=True, head_hidden_size=None, head_layer_num=1, activation=nn.ReLU(), norm_type=None, strides=None)
¶
Overview
Init the DiscreteBC (encoder + head) Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].
- action_shape (:obj:Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3].
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size.
- dueling (:obj:dueling): Whether choose DuelingHead or DiscreteHead(default).
- head_hidden_size (:obj:Optional[int]): The hidden_size of head network.
- head_layer_num (:obj:int): The number of layers used in the head network to compute Q value output
- activation (:obj:Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details.
- strides (:obj:Optional[list]): The strides for each convolution layers, such as [2, 2, 2]. The length of this argument should be the same as encoder_hidden_size_list.
forward(x)
¶
Overview
DiscreteBC forward computation graph, input observation tensor to predict q_value.
Arguments:
- x (:obj:torch.Tensor): Observation inputs
Returns:
- outputs (:obj:Dict): DiscreteBC forward outputs, such as q_value.
ReturnsKeys:
- logit (:obj:torch.Tensor): Discrete Q-value output of each action dimension.
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is obs_shape
- logit (:obj:torch.FloatTensor): :math:(B, M), where B is batch size and M is action_shape
Examples:
>>> model = DiscreteBC(32, 6) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 32)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 6])
ContinuousBC
¶
Bases: Module
Overview
The ContinuousBC network.
Interfaces:
__init__, forward
__init__(obs_shape, action_shape, action_space, actor_head_hidden_size=64, actor_head_layer_num=1, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the ContinuousBC Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's shape, such as 128, (156, ).
- action_shape (:obj:Union[int, SequenceType, EasyDict]): Action's shape, such as 4, (3, ), EasyDict({'action_type_shape': 3, 'action_args_shape': 4}).
- action_space (:obj:str): The type of action space, including [regression, reparameterization].
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor head.
- actor_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for actor head.
- activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP after each FC layer, if None then default set to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization to after network layer (FC, Conv), see ding.torch_utils.network for more details.
forward(inputs)
¶
Overview
The unique execution (forward) method of ContinuousBC.
Arguments:
- inputs (:obj:torch.Tensor): Observation data, defaults to tensor.
Returns:
- output (:obj:Dict): Output dict data, including different key-values among distinct action_space.
ReturnsKeys:
- action (:obj:torch.Tensor): action output of actor network, with shape :math:(B, action_shape).
- logit (:obj:List[torch.Tensor]): reparameterized action output of actor network, with shape :math:(B, action_shape).
Shapes:
- inputs (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is obs_shape
- action (:obj:torch.FloatTensor): :math:(B, M), where B is batch size and M is action_shape
- logit (:obj:List[torch.FloatTensor]): :math:(B, M), where B is batch size and M is action_shape
Examples (Regression):
>>> model = ContinuousBC(32, 6, action_space='regression')
>>> inputs = torch.randn(4, 32)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict) and outputs['action'].shape == torch.Size([4, 6])
Examples (Reparameterization):
>>> model = ContinuousBC(32, 6, action_space='reparameterization')
>>> inputs = torch.randn(4, 32)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'][0].shape == torch.Size([4, 6])
>>> assert outputs['logit'][1].shape == torch.Size([4, 6])
Full Source Code
../ding/model/template/bc.py