ding.model.common.head¶
ding.model.common.head
¶
DiscreteHead
¶
Bases: Module
Overview
The DiscreteHead is used to generate discrete actions logit or Q-value logit, which is often used in q-learning algorithms or actor-critic algorithms for discrete action space.
Interfaces:
__init__, forward.
__init__(hidden_size, output_size, layer_num=1, activation=nn.ReLU(), norm_type=None, dropout=None, noise=False)
¶
Overview
Init the DiscreteHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to DiscreteHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- dropout (:obj:float): The dropout rate, default set to None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with DiscreteHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keyword logit (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
Examples:
>>> head = DiscreteHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 64])
DistributionHead
¶
Bases: Module
Overview
The DistributionHead is used to generate distribution for Q-value.
This module is used in C51 algorithm.
Interfaces:
__init__, forward.
__init__(hidden_size, output_size, layer_num=1, n_atom=51, v_min=-10, v_max=10, activation=nn.ReLU(), norm_type=None, noise=False, eps=1e-06)
¶
Overview
Init the DistributionHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to DistributionHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value distribution.
- n_atom (:obj:int): The number of atoms (discrete supports). Default is 51.
- v_min (:obj:int): Min value of atoms. Default is -10.
- v_max (:obj:int): Max value of atoms. Default is 10.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
- eps (:obj:float): Small constant used for numerical stability.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with DistributionHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor) and distribution (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
- distribution: :math:(B, M, n_atom).
Examples:
>>> head = DistributionHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom is 51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])
BranchingHead
¶
Bases: Module
Overview
The BranchingHead is used to generate Q-value with different branches.
This module is used in Branch DQN.
Interfaces:
__init__, forward.
__init__(hidden_size, num_branches=0, action_bins_per_branch=2, layer_num=1, a_layer_num=None, v_layer_num=None, norm_type=None, activation=nn.ReLU(), noise=False)
¶
Overview
Init the BranchingHead layers according to the provided arguments. This head achieves a linear increase of the number of network outputs with the number of degrees of freedom by allowing a level of independence for each individual action.
Therefore, this head is suitable for high dimensional action Spaces.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to BranchingHead.
- num_branches (:obj:int): The number of branches, which is equivalent to the action dimension.
- action_bins_per_branch (:obj:int): The number of action bins in each dimension.
- layer_num (:obj:int): The number of layers used in the network to compute Advantage and Value output.
- a_layer_num (:obj:int): The number of layers used in the network to compute Advantage output.
- v_layer_num (:obj:int): The number of layers used in the network to compute Value output.
- output_size (:obj:int): The number of outputs.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with BranchingHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keyword logit (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
Examples:
>>> head = BranchingHead(64, 5, 2)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 5, 2])
RainbowHead
¶
Bases: Module
Overview
The RainbowHead is used to generate distribution of Q-value.
This module is used in Rainbow DQN.
Interfaces:
__init__, forward.
__init__(hidden_size, output_size, layer_num=1, n_atom=51, v_min=-10, v_max=10, activation=nn.ReLU(), norm_type=None, noise=True, eps=1e-06)
¶
Overview
Init the RainbowHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to RainbowHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- n_atom (:obj:int): The number of atoms (discrete supports). Default is 51.
- v_min (:obj:int): Min value of atoms. Default is -10.
- v_max (:obj:int): Max value of atoms. Default is 10.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
- eps (:obj:float): Small constant used for numerical stability.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with RainbowHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor) and distribution (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
- distribution: :math:(B, M, n_atom).
Examples:
>>> head = RainbowHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom is 51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])
QRDQNHead
¶
Bases: Module
Overview
The QRDQNHead (Quantile Regression DQN) is used to output action quantiles.
Interfaces:
__init__, forward.
__init__(hidden_size, output_size, layer_num=1, num_quantiles=32, activation=nn.ReLU(), norm_type=None, noise=False)
¶
Overview
Init the QRDQNHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to QRDQNHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- num_quantiles (:obj:int): The number of quantiles. Default is 32.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with QRDQNHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor), q (:obj:torch.Tensor), and tau (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
- q: :math:(B, M, num_quantiles).
- tau: :math:(B, M, 1).
Examples:
>>> head = QRDQNHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles is 32
>>> assert outputs['q'].shape == torch.Size([4, 64, 32])
>>> assert outputs['tau'].shape == torch.Size([4, 32, 1])
QuantileHead
¶
Bases: Module
Overview
The QuantileHead is used to output action quantiles.
This module is used in IQN.
Interfaces:
__init__, forward, quantile_net.
.. note::
The difference between QuantileHead and QRDQNHead is that QuantileHead models the state-action quantile function as a mapping from state-actions and samples from some base distribution while QRDQNHead approximates random returns by a uniform mixture of Diracs functions.
__init__(hidden_size, output_size, layer_num=1, num_quantiles=32, quantile_embedding_size=128, beta_function_type='uniform', activation=nn.ReLU(), norm_type=None, noise=False)
¶
Overview
Init the QuantileHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to QuantileHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- num_quantiles (:obj:int): The number of quantiles.
- quantile_embedding_size (:obj:int): The embedding size of a quantile.
- beta_function_type (:obj:str): Type of beta function. See ding.rl_utils.beta_function.py for more details. Default is uniform.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
quantile_net(quantiles)
¶
Overview
Deterministic parametric function trained to reparameterize samples from a base distribution. By repeated Bellman update iterations of Q-learning, the optimal action-value function is estimated.
Arguments:
- x (:obj:torch.Tensor): The encoded embedding tensor of parametric sample.
Returns:
- quantile_net (:obj:torch.Tensor): Quantile network output tensor after reparameterization.
Shapes:
- quantile_net :math:(quantile_embedding_size, M), where M = output_size.
Examples:
>>> head = QuantileHead(64, 64)
>>> quantiles = torch.randn(128,1)
>>> qn_output = head.quantile_net(quantiles)
>>> assert isinstance(qn_output, torch.Tensor)
>>> # default quantile_embedding_size: int = 128,
>>> assert qn_output.shape == torch.Size([128, 64])
forward(x, num_quantiles=None)
¶
Overview
Use encoded embedding tensor to run MLP with QuantileHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor), q (:obj:torch.Tensor), and quantiles (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
- q: :math:(num_quantiles, B, M).
- quantiles: :math:(quantile_embedding_size, 1).
Examples:
>>> head = QuantileHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles is 32
>>> assert outputs['q'].shape == torch.Size([32, 4, 64])
>>> assert outputs['quantiles'].shape == torch.Size([128, 1])
FQFHead
¶
Bases: Module
Overview
The FQFHead is used to output action quantiles.
This module is used in FQF.
Interfaces:
__init__, forward, quantile_net.
.. note:: The implementation of FQFHead is based on the paper https://arxiv.org/abs/1911.02140. The difference between FQFHead and QuantileHead is that, in FQF, N adjustable quantile values for N adjustable quantile fractions are estimated to approximate the quantile function. The distribution of the return is approximated by a weighted mixture of N Diracs functions. While in IQN, the state-action quantile function is modeled as a mapping from state-actions and samples from some base distribution.
__init__(hidden_size, output_size, layer_num=1, num_quantiles=32, quantile_embedding_size=128, activation=nn.ReLU(), norm_type=None, noise=False)
¶
Overview
Init the FQFHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to FQFHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- num_quantiles (:obj:int): The number of quantiles.
- quantile_embedding_size (:obj:int): The embedding size of a quantile.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
quantile_net(quantiles)
¶
Overview
Deterministic parametric function trained to reparameterize samples from the quantiles_proposal network. By repeated Bellman update iterations of Q-learning, the optimal action-value function is estimated.
Arguments:
- x (:obj:torch.Tensor): The encoded embedding tensor of parametric sample.
Returns:
- quantile_net (:obj:torch.Tensor): Quantile network output tensor after reparameterization.
Examples:
>>> head = FQFHead(64, 64)
>>> quantiles = torch.randn(4,32)
>>> qn_output = head.quantile_net(quantiles)
>>> assert isinstance(qn_output, torch.Tensor)
>>> # default quantile_embedding_size: int = 128,
>>> assert qn_output.shape == torch.Size([4, 32, 64])
forward(x, num_quantiles=None)
¶
Overview
Use encoded embedding tensor to run MLP with FQFHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor), q (:obj:torch.Tensor), quantiles (:obj:torch.Tensor), quantiles_hats (:obj:torch.Tensor), q_tau_i (:obj:torch.Tensor), entropies (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
- q: :math:(B, num_quantiles, M).
- quantiles: :math:(B, num_quantiles + 1).
- quantiles_hats: :math:(B, num_quantiles).
- q_tau_i: :math:(B, num_quantiles - 1, M).
- entropies: :math:(B, 1).
Examples:
>>> head = FQFHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles is 32
>>> assert outputs['q'].shape == torch.Size([4, 32, 64])
>>> assert outputs['quantiles'].shape == torch.Size([4, 33])
>>> assert outputs['quantiles_hats'].shape == torch.Size([4, 32])
>>> assert outputs['q_tau_i'].shape == torch.Size([4, 31, 64])
>>> assert outputs['quantiles'].shape == torch.Size([4, 1])
DuelingHead
¶
Bases: Module
Overview
The DuelingHead is used to output discrete actions logit.
This module is used in Dueling DQN.
Interfaces:
__init__, forward.
__init__(hidden_size, output_size, layer_num=1, a_layer_num=None, v_layer_num=None, activation=nn.ReLU(), norm_type=None, dropout=None, noise=False)
¶
Overview
Init the DuelingHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to DuelingHead.
- output_size (:obj:int): The number of outputs.
- a_layer_num (:obj:int): The number of layers used in the network to compute action output.
- v_layer_num (:obj:int): The number of layers used in the network to compute value output.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- dropout (:obj:float): The dropout rate of dropout layer. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with DuelingHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keyword logit (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
Examples:
>>> head = DuelingHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
StochasticDuelingHead
¶
Bases: Module
Overview
The Stochastic Dueling Network is proposed in paper ACER (arxiv 1611.01224). That is to say, dueling network architecture in continuous action space.
Interfaces:
__init__, forward.
__init__(hidden_size, action_shape, layer_num=1, a_layer_num=None, v_layer_num=None, activation=nn.ReLU(), norm_type=None, noise=False, last_tanh=True)
¶
Overview
Init the Stochastic DuelingHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to StochasticDuelingHead.
- action_shape (:obj:int): The number of continuous action shape, usually integer value.
- layer_num (:obj:int): The number of default layers used in the network to compute action and value output.
- a_layer_num (:obj:int): The number of layers used in the network to compute action output. Default is layer_num.
- v_layer_num (:obj:int): The number of layers used in the network to compute value output. Default is layer_num.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
- last_tanh (:obj:bool): If True Apply tanh to actions. Default True.
forward(s, a, mu, sigma, sample_size=10)
¶
Overview
Use encoded embedding tensor to run MLP with StochasticDuelingHead and return the prediction dictionary.
Arguments:
- s (:obj:torch.Tensor): Tensor containing input embedding.
- a (:obj:torch.Tensor): The original continuous behaviour action.
- mu (:obj:torch.Tensor): The mu gaussian reparameterization output of actor head at current timestep.
- sigma (:obj:torch.Tensor): The sigma gaussian reparameterization output of actor head at current timestep.
- sample_size (:obj:int): The number of samples for continuous action when computing the Q value.
Returns:
- outputs (:obj:Dict): Dict containing keywords q_value (:obj:torch.Tensor) and v_value (:obj:torch.Tensor).
Shapes:
- s: :math:(B, N), where B = batch_size and N = hidden_size.
- a: :math:(B, A), where A = action_size.
- mu: :math:(B, A).
- sigma: :math:(B, A).
- q_value: :math:(B, 1).
- v_value: :math:(B, 1).
Examples:
>>> head = StochasticDuelingHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> a = torch.randn(4, 64)
>>> mu = torch.randn(4, 64)
>>> sigma = torch.ones(4, 64)
>>> outputs = head(inputs, a, mu, sigma)
>>> assert isinstance(outputs, dict)
>>> assert outputs['q_value'].shape == torch.Size([4, 1])
>>> assert outputs['v_value'].shape == torch.Size([4, 1])
RegressionHead
¶
Bases: Module
Overview
The RegressionHead is used to regress continuous variables.
This module is used for generating Q-value (DDPG critic) of continuous actions, or state value (A2C/PPO), or directly predicting continuous action (DDPG actor).
Interfaces:
__init__, forward.
__init__(input_size, output_size, layer_num=2, final_tanh=False, activation=nn.ReLU(), norm_type=None, hidden_size=None)
¶
Overview
Init the RegressionHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to RegressionHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- final_tanh (:obj:bool): If True apply tanh to output. Default False.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with RegressionHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keyword pred (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- pred: :math:(B, M), where M = output_size.
Examples:
>>> head = RegressionHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['pred'].shape == torch.Size([4, 64])
ReparameterizationHead
¶
Bases: Module
Overview
The ReparameterizationHead is used to generate Gaussian distribution of continuous variable, which is parameterized by mu and sigma.
This module is often used in stochastic policies, such as PPO and SAC.
Interfaces:
__init__, forward.
__init__(input_size, output_size, layer_num=2, sigma_type=None, fixed_sigma_value=1.0, activation=nn.ReLU(), norm_type=None, bound_type=None, hidden_size=None)
¶
Overview
Init the ReparameterizationHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to ReparameterizationHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- sigma_type (:obj:str): Sigma type used. Choose among ['fixed', 'independent', 'conditioned']. Default is None.
- fixed_sigma_value (:obj:float): When choosing fixed type, the tensor output['sigma'] is filled with this input value. Default is None.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- bound_type (:obj:str): Bound type to apply to output mu. Choose among ['tanh', None]. Default is None.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with ReparameterizationHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords mu (:obj:torch.Tensor) and sigma (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- mu: :math:(B, M), where M = output_size.
- sigma: :math:(B, M).
Examples:
>>> head = ReparameterizationHead(64, 64, sigma_type='fixed')
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['mu'].shape == torch.Size([4, 64])
>>> assert outputs['sigma'].shape == torch.Size([4, 64])
PopArtVHead
¶
Bases: Module
Overview
The PopArtVHead is used to generate adaptive normalized state value. More information can be found in paper Multi-task Deep Reinforcement Learning with PopArt. https://arxiv.org/abs/1809.04474 This module is used in PPO or IMPALA.
Interfaces:
__init__, forward.
__init__(hidden_size, output_size, layer_num=1, activation=nn.ReLU(), norm_type=None)
¶
Overview
Init the PopArtVHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to PopArtVHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with PopArtVHead and return the normalized prediction and the unnormalized prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keyword pred (:obj:torch.Tensor) and unnormalized_pred (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
Examples:
>>> head = PopArtVHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict) and outputs['pred'].shape == torch.Size([4, 64]) and outputs['unnormalized_pred'].shape == torch.Size([4, 64])
AttentionPolicyHead
¶
Bases: Module
Overview
Cross-attention-type discrete action policy head, which is often used in variable discrete action space.
Interfaces:
__init__, forward.
forward(key, query)
¶
Overview
Use attention-like mechanism to combine key and query tensor to output discrete action logit.
Arguments:
- key (:obj:torch.Tensor): Tensor containing key embedding.
- query (:obj:torch.Tensor): Tensor containing query embedding.
Returns:
- logit (:obj:torch.Tensor): Tensor containing output discrete action logit.
Shapes:
- key: :math:(B, N, K), where B = batch_size, N = possible discrete action choices and K = hidden_size.
- query: :math:(B, K).
- logit: :math:(B, N).
Examples:
>>> head = AttentionPolicyHead()
>>> key = torch.randn(4, 5, 64)
>>> query = torch.randn(4, 64)
>>> logit = head(key, query)
>>> assert logit.shape == torch.Size([4, 5])
.. note::
In this head, we assume that the key and query tensor are both normalized.
MultiHead
¶
Bases: Module
Overview
The MultiHead is used to generate multiple similar results.
For example, we can combine Distribution and MultiHead to generate multi-discrete action space logit.
Interfaces:
__init__, forward.
__init__(head_cls, hidden_size, output_size_list, **head_kwargs)
¶
Overview
Init the MultiHead layers according to the provided arguments.
Arguments:
- head_cls (:obj:type): The class of head, choose among [DuelingHead, DistributionHead, ''QuatileHead'', ...].
- hidden_size (:obj:int): The hidden_size of the MLP connected to the Head.
- output_size_list (:obj:int): Sequence of output_size for multi discrete action, e.g. [2, 3, 5].
- head_kwargs: (:obj:dict): Dict containing class-specific arguments.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with MultiHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor) corresponding to the logit of each output each accessed at ['logit'][i].
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, Mi), where Mi = output_size corresponding to output i.
Examples:
>>> head = MultiHead(DuelingHead, 64, [2, 3, 5], v_layer_num=2)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> # output_size_list is [2, 3, 5] as set
>>> # Therefore each dim of logit is as follows
>>> outputs['logit'][0].shape
>>> torch.Size([4, 2])
>>> outputs['logit'][1].shape
>>> torch.Size([4, 3])
>>> outputs['logit'][2].shape
>>> torch.Size([4, 5])
EnsembleHead
¶
Bases: Module
Overview
The EnsembleHead is used to generate Q-value for Q-ensemble in model-based RL algorithms.
Interfaces:
__init__, forward.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with EnsembleHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keyword pred (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N * ensemble_num, 1), where B = batch_size and N = hidden_size.
- pred: :math:(B, M * ensemble_num, 1), where M = output_size.
Examples:
>>> head = EnsembleHead(64 * 10, 64 * 10)
>>> inputs = torch.randn(4, 64 * 10, 1) `
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['pred'].shape == torch.Size([10, 64 * 10])
independent_normal_dist(logits)
¶
Overview
Convert different types logit to independent normal distribution.
Arguments:
- logits (:obj:Union[List, Dict]): The logits to be converted.
Returns:
- dist (:obj:torch.distributions.Distribution): The converted normal distribution.
Examples:
>>> logits = [torch.randn(4, 5), torch.ones(4, 5)]
>>> dist = independent_normal_dist(logits)
>>> assert isinstance(dist, torch.distributions.Independent)
>>> assert isinstance(dist.base_dist, torch.distributions.Normal)
>>> assert dist.base_dist.loc.shape == torch.Size([4, 5])
>>> assert dist.base_dist.scale.shape == torch.Size([4, 5])
Raises:
- TypeError: If the type of logits is not list or dict.
Full Source Code
../ding/model/common/head.py