ding.model.template.qac_dist¶
ding.model.template.qac_dist
¶
QACDIST
¶
Bases: Module
Overview
The QAC model with distributional Q-value.
Interfaces:
__init__, forward, compute_actor, compute_critic
__init__(obs_shape, action_shape, action_space='regression', critic_head_type='categorical', actor_head_hidden_size=64, actor_head_layer_num=1, critic_head_hidden_size=64, critic_head_layer_num=1, activation=nn.ReLU(), norm_type=None, v_min=-10, v_max=10, n_atom=51)
¶
Overview
Init the QAC Distributional Model according to arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's space.
- action_shape (:obj:Union[int, SequenceType]): Action's space.
- action_space (:obj:str): Whether choose regression or reparameterization.
- critic_head_type (:obj:str): Only categorical.
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor-nn's Head.
- actor_head_layer_num (:obj:int):
The num of layers used in the network to compute Q value output for actor's nn.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic-nn's Head.
- critic_head_layer_num (:obj:int):
The num of layers used in the network to compute Q value output for critic's nn.
- activation (:obj:Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn,
if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details.
- v_min (:obj:int): Value of the smallest atom
- v_max (:obj:int): Value of the largest atom
- n_atom (:obj:int): Number of atoms in the support
forward(inputs, mode)
¶
Overview
Use observation and action tensor to predict output. Parameter updates with QACDIST's MLPs forward setup.
Arguments:
Forward with 'compute_actor':
- inputs (:obj:torch.Tensor):
The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size).
Whether actor_head_hidden_size or critic_head_hidden_size depend on mode.
Forward with ``'compute_critic'``, inputs (`Dict`) Necessary Keys:
- ``obs``, ``action`` encoded tensors.
- mode (:obj:`str`): Name of the forward mode.
Returns:
- outputs (:obj:Dict): Outputs of network forward.
Forward with ``'compute_actor'``, Necessary Keys (either):
- action (:obj:`torch.Tensor`): Action tensor with same size as input ``x``.
- logit (:obj:`torch.Tensor`):
Logit tensor encoding ``mu`` and ``sigma``, both with same size as input ``x``.
Forward with ``'compute_critic'``, Necessary Keys:
- q_value (:obj:`torch.Tensor`): Q value tensor with same size as batch size.
- distribution (:obj:`torch.Tensor`): Q value distribution tensor.
Actor Shapes:
- inputs (:obj:torch.Tensor): :math:(B, N0), B is batch size and N0 corresponds to hidden_size
- action (:obj:torch.Tensor): :math:(B, N0)
- q_value (:obj:torch.FloatTensor): :math:(B, ), where B is batch size.
Critic Shapes
- obs (:obj:
torch.Tensor): :math:(B, N1), where B is batch size and N1 isobs_shape - action (:obj:
torch.Tensor): :math:(B, N2), where B is batch size and N2 isaction_shape - q_value (:obj:
torch.FloatTensor): :math:(B, N2), where B is batch size and N2 isaction_shape - distribution (:obj:
torch.FloatTensor): :math:(B, 1, N3), where B is batch size and N3 isnum_atom
Actor Examples
Regression mode¶
model = QACDIST(64, 64, 'regression') inputs = torch.randn(4, 64) actor_outputs = model(inputs,'compute_actor') assert actor_outputs['action'].shape == torch.Size([4, 64])
Reparameterization Mode¶
model = QACDIST(64, 64, 'reparameterization') inputs = torch.randn(4, 64) actor_outputs = model(inputs,'compute_actor') actor_outputs['logit'][0].shape # mu torch.Size([4, 64]) actor_outputs['logit'][1].shape # sigma torch.Size([4, 64])
Critic Examples
Categorical mode¶
inputs = {'obs': torch.randn(4,N), 'action': torch.randn(4,1)} model = QACDIST(obs_shape=(N, ),action_shape=1,action_space='regression', ... critic_head_type='categorical', n_atoms=51) q_value = model(inputs, mode='compute_critic') # q value assert q_value['q_value'].shape == torch.Size([4, 1]) assert q_value['distribution'].shape == torch.Size([4, 1, 51])
compute_actor(inputs)
¶
Overview
Use encoded embedding tensor to predict output.
Execute parameter updates with 'compute_actor' mode
Use encoded embedding tensor to predict output.
Arguments:
- inputs (:obj:torch.Tensor):
The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size).
hidden_size = actor_head_hidden_size
- mode (:obj:str): Name of the forward mode.
Returns:
- outputs (:obj:Dict): Outputs of forward pass encoder and head.
ReturnsKeys (either):
- action (:obj:torch.Tensor): Continuous action tensor with same size as action_shape.
- logit (:obj:torch.Tensor):
Logit tensor encoding mu and sigma, both with same size as input x.
Shapes:
- inputs (:obj:torch.Tensor): :math:(B, N0), B is batch size and N0 corresponds to hidden_size
- action (:obj:torch.Tensor): :math:(B, N0)
- logit (:obj:list): 2 elements, mu and sigma, each is the shape of :math:(B, N0).
- q_value (:obj:torch.FloatTensor): :math:(B, ), B is batch size.
Examples:
>>> # Regression mode
>>> model = QACDIST(64, 64, 'regression')
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> assert actor_outputs['action'].shape == torch.Size([4, 64])
>>> # Reparameterization Mode
>>> model = QACDIST(64, 64, 'reparameterization')
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> actor_outputs['logit'][0].shape # mu
>>> torch.Size([4, 64])
>>> actor_outputs['logit'][1].shape # sigma
>>> torch.Size([4, 64])
compute_critic(inputs)
¶
Overview
Execute parameter updates with 'compute_critic' mode
Use encoded embedding tensor to predict output.
Arguments:
- obs, action encoded tensors.
- mode (:obj:str): Name of the forward mode.
Returns:
- outputs (:obj:Dict): Q-value output and distribution.
ReturnKeys
- q_value (:obj:
torch.Tensor): Q value tensor with same size as batch size. - distribution (:obj:
torch.Tensor): Q value distribution tensor.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, N1), where B is batch size and N1 is obs_shape
- action (:obj:torch.Tensor): :math:(B, N2), where B is batch size and N2 isaction_shape
- q_value (:obj:torch.FloatTensor): :math:(B, N2), where B is batch size and N2 is action_shape
- distribution (:obj:torch.FloatTensor): :math:(B, 1, N3), where B is batch size and N3 is num_atom
Examples:
>>> # Categorical mode
>>> inputs = {'obs': torch.randn(4,N), 'action': torch.randn(4,1)}
>>> model = QACDIST(obs_shape=(N, ),action_shape=1,action_space='regression', ... critic_head_type='categorical', n_atoms=51)
>>> q_value = model(inputs, mode='compute_critic') # q value
>>> assert q_value['q_value'].shape == torch.Size([4, 1])
>>> assert q_value['distribution'].shape == torch.Size([4, 1, 51])
Full Source Code
../ding/model/template/qac_dist.py