Skip to content

ding.model.common.head

ding.model.common.head

DiscreteHead

Bases: Module

Overview

The DiscreteHead is used to generate discrete actions logit or Q-value logit, which is often used in q-learning algorithms or actor-critic algorithms for discrete action space.

Interfaces: __init__, forward.

__init__(hidden_size, output_size, layer_num=1, activation=nn.ReLU(), norm_type=None, dropout=None, noise=False)

Overview

Init the DiscreteHead layers according to the provided arguments.

Arguments: - hidden_size (:obj:int): The hidden_size of the MLP connected to DiscreteHead. - output_size (:obj:int): The number of outputs. - layer_num (:obj:int): The number of layers used in the network to compute Q value output. - activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None. - norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None. - dropout (:obj:float): The dropout rate, default set to None. - noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.

forward(x)

Overview

Use encoded embedding tensor to run MLP with DiscreteHead and return the prediction dictionary.

Arguments: - x (:obj:torch.Tensor): Tensor containing input embedding. Returns: - outputs (:obj:Dict): Dict containing keyword logit (:obj:torch.Tensor). Shapes: - x: :math:(B, N), where B = batch_size and N = hidden_size. - logit: :math:(B, M), where M = output_size. Examples: >>> head = DiscreteHead(64, 64) >>> inputs = torch.randn(4, 64) >>> outputs = head(inputs) >>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 64])

DistributionHead

Bases: Module

Overview

The DistributionHead is used to generate distribution for Q-value. This module is used in C51 algorithm.

Interfaces: __init__, forward.

__init__(hidden_size, output_size, layer_num=1, n_atom=51, v_min=-10, v_max=10, activation=nn.ReLU(), norm_type=None, noise=False, eps=1e-06)

Overview

Init the DistributionHead layers according to the provided arguments.

Arguments: - hidden_size (:obj:int): The hidden_size of the MLP connected to DistributionHead. - output_size (:obj:int): The number of outputs. - layer_num (:obj:int): The number of layers used in the network to compute Q value distribution. - n_atom (:obj:int): The number of atoms (discrete supports). Default is 51. - v_min (:obj:int): Min value of atoms. Default is -10. - v_max (:obj:int): Max value of atoms. Default is 10. - activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None. - norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None. - noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False. - eps (:obj:float): Small constant used for numerical stability.

forward(x)

Overview

Use encoded embedding tensor to run MLP with DistributionHead and return the prediction dictionary.

Arguments: - x (:obj:torch.Tensor): Tensor containing input embedding. Returns: - outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor) and distribution (:obj:torch.Tensor). Shapes: - x: :math:(B, N), where B = batch_size and N = hidden_size. - logit: :math:(B, M), where M = output_size. - distribution: :math:(B, M, n_atom). Examples: >>> head = DistributionHead(64, 64) >>> inputs = torch.randn(4, 64) >>> outputs = head(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['logit'].shape == torch.Size([4, 64]) >>> # default n_atom is 51 >>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])

BranchingHead

Bases: Module

Overview

The BranchingHead is used to generate Q-value with different branches. This module is used in Branch DQN.

Interfaces: __init__, forward.

__init__(hidden_size, num_branches=0, action_bins_per_branch=2, layer_num=1, a_layer_num=None, v_layer_num=None, norm_type=None, activation=nn.ReLU(), noise=False)

Overview

Init the BranchingHead layers according to the provided arguments. This head achieves a linear increase of the number of network outputs with the number of degrees of freedom by allowing a level of independence for each individual action. Therefore, this head is suitable for high dimensional action Spaces.

Arguments: - hidden_size (:obj:int): The hidden_size of the MLP connected to BranchingHead. - num_branches (:obj:int): The number of branches, which is equivalent to the action dimension. - action_bins_per_branch (:obj:int): The number of action bins in each dimension. - layer_num (:obj:int): The number of layers used in the network to compute Advantage and Value output. - a_layer_num (:obj:int): The number of layers used in the network to compute Advantage output. - v_layer_num (:obj:int): The number of layers used in the network to compute Value output. - output_size (:obj:int): The number of outputs. - norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None. - activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None. - noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.

forward(x)

Overview

Use encoded embedding tensor to run MLP with BranchingHead and return the prediction dictionary.

Arguments: - x (:obj:torch.Tensor): Tensor containing input embedding. Returns: - outputs (:obj:Dict): Dict containing keyword logit (:obj:torch.Tensor). Shapes: - x: :math:(B, N), where B = batch_size and N = hidden_size. - logit: :math:(B, M), where M = output_size. Examples: >>> head = BranchingHead(64, 5, 2) >>> inputs = torch.randn(4, 64) >>> outputs = head(inputs) >>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 5, 2])

RainbowHead

Bases: Module

Overview

The RainbowHead is used to generate distribution of Q-value. This module is used in Rainbow DQN.

Interfaces: __init__, forward.

__init__(hidden_size, output_size, layer_num=1, n_atom=51, v_min=-10, v_max=10, activation=nn.ReLU(), norm_type=None, noise=True, eps=1e-06)

Overview

Init the RainbowHead layers according to the provided arguments.

Arguments: - hidden_size (:obj:int): The hidden_size of the MLP connected to RainbowHead. - output_size (:obj:int): The number of outputs. - layer_num (:obj:int): The number of layers used in the network to compute Q value output. - n_atom (:obj:int): The number of atoms (discrete supports). Default is 51. - v_min (:obj:int): Min value of atoms. Default is -10. - v_max (:obj:int): Max value of atoms. Default is 10. - activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None. - norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None. - noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False. - eps (:obj:float): Small constant used for numerical stability.

forward(x)

Overview

Use encoded embedding tensor to run MLP with RainbowHead and return the prediction dictionary.

Arguments: - x (:obj:torch.Tensor): Tensor containing input embedding. Returns: - outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor) and distribution (:obj:torch.Tensor). Shapes: - x: :math:(B, N), where B = batch_size and N = hidden_size. - logit: :math:(B, M), where M = output_size. - distribution: :math:(B, M, n_atom). Examples: >>> head = RainbowHead(64, 64) >>> inputs = torch.randn(4, 64) >>> outputs = head(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['logit'].shape == torch.Size([4, 64]) >>> # default n_atom is 51 >>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])

QRDQNHead

Bases: Module

Overview

The QRDQNHead (Quantile Regression DQN) is used to output action quantiles.

Interfaces: __init__, forward.

__init__(hidden_size, output_size, layer_num=1, num_quantiles=32, activation=nn.ReLU(), norm_type=None, noise=False)

Overview

Init the QRDQNHead layers according to the provided arguments.

Arguments: - hidden_size (:obj:int): The hidden_size of the MLP connected to QRDQNHead. - output_size (:obj:int): The number of outputs. - layer_num (:obj:int): The number of layers used in the network to compute Q value output. - num_quantiles (:obj:int): The number of quantiles. Default is 32. - activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None. - norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None. - noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.

forward(x)

Overview

Use encoded embedding tensor to run MLP with QRDQNHead and return the prediction dictionary.

Arguments: - x (:obj:torch.Tensor): Tensor containing input embedding. Returns: - outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor), q (:obj:torch.Tensor), and tau (:obj:torch.Tensor). Shapes: - x: :math:(B, N), where B = batch_size and N = hidden_size. - logit: :math:(B, M), where M = output_size. - q: :math:(B, M, num_quantiles). - tau: :math:(B, M, 1). Examples: >>> head = QRDQNHead(64, 64) >>> inputs = torch.randn(4, 64) >>> outputs = head(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['logit'].shape == torch.Size([4, 64]) >>> # default num_quantiles is 32 >>> assert outputs['q'].shape == torch.Size([4, 64, 32]) >>> assert outputs['tau'].shape == torch.Size([4, 32, 1])

QuantileHead

Bases: Module

Overview

The QuantileHead is used to output action quantiles. This module is used in IQN.

Interfaces: __init__, forward, quantile_net.

.. note:: The difference between QuantileHead and QRDQNHead is that QuantileHead models the state-action quantile function as a mapping from state-actions and samples from some base distribution while QRDQNHead approximates random returns by a uniform mixture of Diracs functions.

__init__(hidden_size, output_size, layer_num=1, num_quantiles=32, quantile_embedding_size=128, beta_function_type='uniform', activation=nn.ReLU(), norm_type=None, noise=False)

Overview

Init the QuantileHead layers according to the provided arguments.

Arguments: - hidden_size (:obj:int): The hidden_size of the MLP connected to QuantileHead. - output_size (:obj:int): The number of outputs. - layer_num (:obj:int): The number of layers used in the network to compute Q value output. - num_quantiles (:obj:int): The number of quantiles. - quantile_embedding_size (:obj:int): The embedding size of a quantile. - beta_function_type (:obj:str): Type of beta function. See ding.rl_utils.beta_function.py for more details. Default is uniform. - activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None. - norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None. - noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.

quantile_net(quantiles)

Overview

Deterministic parametric function trained to reparameterize samples from a base distribution. By repeated Bellman update iterations of Q-learning, the optimal action-value function is estimated.

Arguments: - x (:obj:torch.Tensor): The encoded embedding tensor of parametric sample. Returns: - quantile_net (:obj:torch.Tensor): Quantile network output tensor after reparameterization. Shapes: - quantile_net :math:(quantile_embedding_size, M), where M = output_size. Examples: >>> head = QuantileHead(64, 64) >>> quantiles = torch.randn(128,1) >>> qn_output = head.quantile_net(quantiles) >>> assert isinstance(qn_output, torch.Tensor) >>> # default quantile_embedding_size: int = 128, >>> assert qn_output.shape == torch.Size([128, 64])

forward(x, num_quantiles=None)

Overview

Use encoded embedding tensor to run MLP with QuantileHead and return the prediction dictionary.

Arguments: - x (:obj:torch.Tensor): Tensor containing input embedding. Returns: - outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor), q (:obj:torch.Tensor), and quantiles (:obj:torch.Tensor). Shapes: - x: :math:(B, N), where B = batch_size and N = hidden_size. - logit: :math:(B, M), where M = output_size. - q: :math:(num_quantiles, B, M). - quantiles: :math:(quantile_embedding_size, 1). Examples: >>> head = QuantileHead(64, 64) >>> inputs = torch.randn(4, 64) >>> outputs = head(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['logit'].shape == torch.Size([4, 64]) >>> # default num_quantiles is 32 >>> assert outputs['q'].shape == torch.Size([32, 4, 64]) >>> assert outputs['quantiles'].shape == torch.Size([128, 1])

FQFHead

Bases: Module

Overview

The FQFHead is used to output action quantiles. This module is used in FQF.

Interfaces: __init__, forward, quantile_net.

.. note:: The implementation of FQFHead is based on the paper https://arxiv.org/abs/1911.02140. The difference between FQFHead and QuantileHead is that, in FQF, N adjustable quantile values for N adjustable quantile fractions are estimated to approximate the quantile function. The distribution of the return is approximated by a weighted mixture of N Diracs functions. While in IQN, the state-action quantile function is modeled as a mapping from state-actions and samples from some base distribution.

__init__(hidden_size, output_size, layer_num=1, num_quantiles=32, quantile_embedding_size=128, activation=nn.ReLU(), norm_type=None, noise=False)

Overview

Init the FQFHead layers according to the provided arguments.

Arguments: - hidden_size (:obj:int): The hidden_size of the MLP connected to FQFHead. - output_size (:obj:int): The number of outputs. - layer_num (:obj:int): The number of layers used in the network to compute Q value output. - num_quantiles (:obj:int): The number of quantiles. - quantile_embedding_size (:obj:int): The embedding size of a quantile. - activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None. - norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None. - noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.

quantile_net(quantiles)

Overview

Deterministic parametric function trained to reparameterize samples from the quantiles_proposal network. By repeated Bellman update iterations of Q-learning, the optimal action-value function is estimated.

Arguments: - x (:obj:torch.Tensor): The encoded embedding tensor of parametric sample. Returns: - quantile_net (:obj:torch.Tensor): Quantile network output tensor after reparameterization. Examples: >>> head = FQFHead(64, 64) >>> quantiles = torch.randn(4,32) >>> qn_output = head.quantile_net(quantiles) >>> assert isinstance(qn_output, torch.Tensor) >>> # default quantile_embedding_size: int = 128, >>> assert qn_output.shape == torch.Size([4, 32, 64])

forward(x, num_quantiles=None)

Overview

Use encoded embedding tensor to run MLP with FQFHead and return the prediction dictionary.

Arguments: - x (:obj:torch.Tensor): Tensor containing input embedding. Returns: - outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor), q (:obj:torch.Tensor), quantiles (:obj:torch.Tensor), quantiles_hats (:obj:torch.Tensor), q_tau_i (:obj:torch.Tensor), entropies (:obj:torch.Tensor). Shapes: - x: :math:(B, N), where B = batch_size and N = hidden_size. - logit: :math:(B, M), where M = output_size. - q: :math:(B, num_quantiles, M). - quantiles: :math:(B, num_quantiles + 1). - quantiles_hats: :math:(B, num_quantiles). - q_tau_i: :math:(B, num_quantiles - 1, M). - entropies: :math:(B, 1). Examples: >>> head = FQFHead(64, 64) >>> inputs = torch.randn(4, 64) >>> outputs = head(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['logit'].shape == torch.Size([4, 64]) >>> # default num_quantiles is 32 >>> assert outputs['q'].shape == torch.Size([4, 32, 64]) >>> assert outputs['quantiles'].shape == torch.Size([4, 33]) >>> assert outputs['quantiles_hats'].shape == torch.Size([4, 32]) >>> assert outputs['q_tau_i'].shape == torch.Size([4, 31, 64]) >>> assert outputs['quantiles'].shape == torch.Size([4, 1])

DuelingHead

Bases: Module

Overview

The DuelingHead is used to output discrete actions logit. This module is used in Dueling DQN.

Interfaces: __init__, forward.

__init__(hidden_size, output_size, layer_num=1, a_layer_num=None, v_layer_num=None, activation=nn.ReLU(), norm_type=None, dropout=None, noise=False)

Overview

Init the DuelingHead layers according to the provided arguments.

Arguments: - hidden_size (:obj:int): The hidden_size of the MLP connected to DuelingHead. - output_size (:obj:int): The number of outputs. - a_layer_num (:obj:int): The number of layers used in the network to compute action output. - v_layer_num (:obj:int): The number of layers used in the network to compute value output. - activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None. - norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None. - dropout (:obj:float): The dropout rate of dropout layer. Default None. - noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.

forward(x)

Overview

Use encoded embedding tensor to run MLP with DuelingHead and return the prediction dictionary.

Arguments: - x (:obj:torch.Tensor): Tensor containing input embedding. Returns: - outputs (:obj:Dict): Dict containing keyword logit (:obj:torch.Tensor). Shapes: - x: :math:(B, N), where B = batch_size and N = hidden_size. - logit: :math:(B, M), where M = output_size. Examples: >>> head = DuelingHead(64, 64) >>> inputs = torch.randn(4, 64) >>> outputs = head(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['logit'].shape == torch.Size([4, 64])

StochasticDuelingHead

Bases: Module

Overview

The Stochastic Dueling Network is proposed in paper ACER (arxiv 1611.01224). That is to say, dueling network architecture in continuous action space.

Interfaces: __init__, forward.

__init__(hidden_size, action_shape, layer_num=1, a_layer_num=None, v_layer_num=None, activation=nn.ReLU(), norm_type=None, noise=False, last_tanh=True)

Overview

Init the Stochastic DuelingHead layers according to the provided arguments.

Arguments: - hidden_size (:obj:int): The hidden_size of the MLP connected to StochasticDuelingHead. - action_shape (:obj:int): The number of continuous action shape, usually integer value. - layer_num (:obj:int): The number of default layers used in the network to compute action and value output. - a_layer_num (:obj:int): The number of layers used in the network to compute action output. Default is layer_num. - v_layer_num (:obj:int): The number of layers used in the network to compute value output. Default is layer_num. - activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None. - norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None. - noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False. - last_tanh (:obj:bool): If True Apply tanh to actions. Default True.

forward(s, a, mu, sigma, sample_size=10)

Overview

Use encoded embedding tensor to run MLP with StochasticDuelingHead and return the prediction dictionary.

Arguments: - s (:obj:torch.Tensor): Tensor containing input embedding. - a (:obj:torch.Tensor): The original continuous behaviour action. - mu (:obj:torch.Tensor): The mu gaussian reparameterization output of actor head at current timestep. - sigma (:obj:torch.Tensor): The sigma gaussian reparameterization output of actor head at current timestep. - sample_size (:obj:int): The number of samples for continuous action when computing the Q value. Returns: - outputs (:obj:Dict): Dict containing keywords q_value (:obj:torch.Tensor) and v_value (:obj:torch.Tensor). Shapes: - s: :math:(B, N), where B = batch_size and N = hidden_size. - a: :math:(B, A), where A = action_size. - mu: :math:(B, A). - sigma: :math:(B, A). - q_value: :math:(B, 1). - v_value: :math:(B, 1). Examples: >>> head = StochasticDuelingHead(64, 64) >>> inputs = torch.randn(4, 64) >>> a = torch.randn(4, 64) >>> mu = torch.randn(4, 64) >>> sigma = torch.ones(4, 64) >>> outputs = head(inputs, a, mu, sigma) >>> assert isinstance(outputs, dict) >>> assert outputs['q_value'].shape == torch.Size([4, 1]) >>> assert outputs['v_value'].shape == torch.Size([4, 1])

RegressionHead

Bases: Module

Overview

The RegressionHead is used to regress continuous variables. This module is used for generating Q-value (DDPG critic) of continuous actions, or state value (A2C/PPO), or directly predicting continuous action (DDPG actor).

Interfaces: __init__, forward.

__init__(input_size, output_size, layer_num=2, final_tanh=False, activation=nn.ReLU(), norm_type=None, hidden_size=None)

Overview

Init the RegressionHead layers according to the provided arguments.

Arguments: - hidden_size (:obj:int): The hidden_size of the MLP connected to RegressionHead. - output_size (:obj:int): The number of outputs. - layer_num (:obj:int): The number of layers used in the network to compute Q value output. - final_tanh (:obj:bool): If True apply tanh to output. Default False. - activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None. - norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.

forward(x)

Overview

Use encoded embedding tensor to run MLP with RegressionHead and return the prediction dictionary.

Arguments: - x (:obj:torch.Tensor): Tensor containing input embedding. Returns: - outputs (:obj:Dict): Dict containing keyword pred (:obj:torch.Tensor). Shapes: - x: :math:(B, N), where B = batch_size and N = hidden_size. - pred: :math:(B, M), where M = output_size. Examples: >>> head = RegressionHead(64, 64) >>> inputs = torch.randn(4, 64) >>> outputs = head(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['pred'].shape == torch.Size([4, 64])

ReparameterizationHead

Bases: Module

Overview

The ReparameterizationHead is used to generate Gaussian distribution of continuous variable, which is parameterized by mu and sigma. This module is often used in stochastic policies, such as PPO and SAC.

Interfaces: __init__, forward.

__init__(input_size, output_size, layer_num=2, sigma_type=None, fixed_sigma_value=1.0, activation=nn.ReLU(), norm_type=None, bound_type=None, hidden_size=None)

Overview

Init the ReparameterizationHead layers according to the provided arguments.

Arguments: - hidden_size (:obj:int): The hidden_size of the MLP connected to ReparameterizationHead. - output_size (:obj:int): The number of outputs. - layer_num (:obj:int): The number of layers used in the network to compute Q value output. - sigma_type (:obj:str): Sigma type used. Choose among ['fixed', 'independent', 'conditioned']. Default is None. - fixed_sigma_value (:obj:float): When choosing fixed type, the tensor output['sigma'] is filled with this input value. Default is None. - activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None. - norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None. - bound_type (:obj:str): Bound type to apply to output mu. Choose among ['tanh', None]. Default is None.

forward(x)

Overview

Use encoded embedding tensor to run MLP with ReparameterizationHead and return the prediction dictionary.

Arguments: - x (:obj:torch.Tensor): Tensor containing input embedding. Returns: - outputs (:obj:Dict): Dict containing keywords mu (:obj:torch.Tensor) and sigma (:obj:torch.Tensor). Shapes: - x: :math:(B, N), where B = batch_size and N = hidden_size. - mu: :math:(B, M), where M = output_size. - sigma: :math:(B, M). Examples: >>> head = ReparameterizationHead(64, 64, sigma_type='fixed') >>> inputs = torch.randn(4, 64) >>> outputs = head(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['mu'].shape == torch.Size([4, 64]) >>> assert outputs['sigma'].shape == torch.Size([4, 64])

PopArtVHead

Bases: Module

Overview

The PopArtVHead is used to generate adaptive normalized state value. More information can be found in paper Multi-task Deep Reinforcement Learning with PopArt. https://arxiv.org/abs/1809.04474 This module is used in PPO or IMPALA.

Interfaces: __init__, forward.

__init__(hidden_size, output_size, layer_num=1, activation=nn.ReLU(), norm_type=None)

Overview

Init the PopArtVHead layers according to the provided arguments.

Arguments: - hidden_size (:obj:int): The hidden_size of the MLP connected to PopArtVHead. - output_size (:obj:int): The number of outputs. - layer_num (:obj:int): The number of layers used in the network to compute Q value output. - activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None. - norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.

forward(x)

Overview

Use encoded embedding tensor to run MLP with PopArtVHead and return the normalized prediction and the unnormalized prediction dictionary.

Arguments: - x (:obj:torch.Tensor): Tensor containing input embedding. Returns: - outputs (:obj:Dict): Dict containing keyword pred (:obj:torch.Tensor) and unnormalized_pred (:obj:torch.Tensor). Shapes: - x: :math:(B, N), where B = batch_size and N = hidden_size. - logit: :math:(B, M), where M = output_size. Examples: >>> head = PopArtVHead(64, 64) >>> inputs = torch.randn(4, 64) >>> outputs = head(inputs) >>> assert isinstance(outputs, dict) and outputs['pred'].shape == torch.Size([4, 64]) and outputs['unnormalized_pred'].shape == torch.Size([4, 64])

AttentionPolicyHead

Bases: Module

Overview

Cross-attention-type discrete action policy head, which is often used in variable discrete action space.

Interfaces: __init__, forward.

forward(key, query)

Overview

Use attention-like mechanism to combine key and query tensor to output discrete action logit.

Arguments: - key (:obj:torch.Tensor): Tensor containing key embedding. - query (:obj:torch.Tensor): Tensor containing query embedding. Returns: - logit (:obj:torch.Tensor): Tensor containing output discrete action logit. Shapes: - key: :math:(B, N, K), where B = batch_size, N = possible discrete action choices and K = hidden_size. - query: :math:(B, K). - logit: :math:(B, N). Examples: >>> head = AttentionPolicyHead() >>> key = torch.randn(4, 5, 64) >>> query = torch.randn(4, 64) >>> logit = head(key, query) >>> assert logit.shape == torch.Size([4, 5])

.. note:: In this head, we assume that the key and query tensor are both normalized.

MultiHead

Bases: Module

Overview

The MultiHead is used to generate multiple similar results. For example, we can combine Distribution and MultiHead to generate multi-discrete action space logit.

Interfaces: __init__, forward.

__init__(head_cls, hidden_size, output_size_list, **head_kwargs)

Overview

Init the MultiHead layers according to the provided arguments.

Arguments: - head_cls (:obj:type): The class of head, choose among [DuelingHead, DistributionHead, ''QuatileHead'', ...]. - hidden_size (:obj:int): The hidden_size of the MLP connected to the Head. - output_size_list (:obj:int): Sequence of output_size for multi discrete action, e.g. [2, 3, 5]. - head_kwargs: (:obj:dict): Dict containing class-specific arguments.

forward(x)

Overview

Use encoded embedding tensor to run MLP with MultiHead and return the prediction dictionary.

Arguments: - x (:obj:torch.Tensor): Tensor containing input embedding. Returns: - outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor) corresponding to the logit of each output each accessed at ['logit'][i]. Shapes: - x: :math:(B, N), where B = batch_size and N = hidden_size. - logit: :math:(B, Mi), where Mi = output_size corresponding to output i. Examples: >>> head = MultiHead(DuelingHead, 64, [2, 3, 5], v_layer_num=2) >>> inputs = torch.randn(4, 64) >>> outputs = head(inputs) >>> assert isinstance(outputs, dict) >>> # output_size_list is [2, 3, 5] as set >>> # Therefore each dim of logit is as follows >>> outputs['logit'][0].shape >>> torch.Size([4, 2]) >>> outputs['logit'][1].shape >>> torch.Size([4, 3]) >>> outputs['logit'][2].shape >>> torch.Size([4, 5])

EnsembleHead

Bases: Module

Overview

The EnsembleHead is used to generate Q-value for Q-ensemble in model-based RL algorithms.

Interfaces: __init__, forward.

forward(x)

Overview

Use encoded embedding tensor to run MLP with EnsembleHead and return the prediction dictionary.

Arguments: - x (:obj:torch.Tensor): Tensor containing input embedding. Returns: - outputs (:obj:Dict): Dict containing keyword pred (:obj:torch.Tensor). Shapes: - x: :math:(B, N * ensemble_num, 1), where B = batch_size and N = hidden_size. - pred: :math:(B, M * ensemble_num, 1), where M = output_size. Examples: >>> head = EnsembleHead(64 * 10, 64 * 10) >>> inputs = torch.randn(4, 64 * 10, 1) ` >>> outputs = head(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['pred'].shape == torch.Size([10, 64 * 10])

independent_normal_dist(logits)

Overview

Convert different types logit to independent normal distribution.

Arguments: - logits (:obj:Union[List, Dict]): The logits to be converted. Returns: - dist (:obj:torch.distributions.Distribution): The converted normal distribution. Examples: >>> logits = [torch.randn(4, 5), torch.ones(4, 5)] >>> dist = independent_normal_dist(logits) >>> assert isinstance(dist, torch.distributions.Independent) >>> assert isinstance(dist.base_dist, torch.distributions.Normal) >>> assert dist.base_dist.loc.shape == torch.Size([4, 5]) >>> assert dist.base_dist.scale.shape == torch.Size([4, 5]) Raises: - TypeError: If the type of logits is not list or dict.

Full Source Code

../ding/model/common/head.py

1from typing import Optional, Dict, Union, List 2 3import math 4import torch 5import torch.nn as nn 6import torch.nn.functional as F 7from torch.distributions import Normal, Independent 8 9from ding.torch_utils import fc_block, noise_block, NoiseLinearLayer, MLP, PopArt, conv1d_block 10from ding.rl_utils import beta_function_map 11from ding.utils import lists_to_dicts, SequenceType 12 13 14class DiscreteHead(nn.Module): 15 """ 16 Overview: 17 The ``DiscreteHead`` is used to generate discrete actions logit or Q-value logit, \ 18 which is often used in q-learning algorithms or actor-critic algorithms for discrete action space. 19 Interfaces: 20 ``__init__``, ``forward``. 21 """ 22 23 def __init__( 24 self, 25 hidden_size: int, 26 output_size: int, 27 layer_num: int = 1, 28 activation: Optional[nn.Module] = nn.ReLU(), 29 norm_type: Optional[str] = None, 30 dropout: Optional[float] = None, 31 noise: Optional[bool] = False, 32 ) -> None: 33 """ 34 Overview: 35 Init the ``DiscreteHead`` layers according to the provided arguments. 36 Arguments: 37 - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``DiscreteHead``. 38 - output_size (:obj:`int`): The number of outputs. 39 - layer_num (:obj:`int`): The number of layers used in the network to compute Q value output. 40 - activation (:obj:`nn.Module`): The type of activation function to use in MLP. \ 41 If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``. 42 - norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \ 43 for more details. Default ``None``. 44 - dropout (:obj:`float`): The dropout rate, default set to None. 45 - noise (:obj:`bool`): Whether use ``NoiseLinearLayer`` as ``layer_fn`` in Q networks' MLP. \ 46 Default ``False``. 47 """ 48 super(DiscreteHead, self).__init__() 49 layer = NoiseLinearLayer if noise else nn.Linear 50 block = noise_block if noise else fc_block 51 self.Q = nn.Sequential( 52 MLP( 53 hidden_size, 54 hidden_size, 55 hidden_size, 56 layer_num, 57 layer_fn=layer, 58 activation=activation, 59 use_dropout=dropout is not None, 60 dropout_probability=dropout, 61 norm_type=norm_type 62 ), block(hidden_size, output_size) 63 ) 64 65 def forward(self, x: torch.Tensor) -> Dict: 66 """ 67 Overview: 68 Use encoded embedding tensor to run MLP with ``DiscreteHead`` and return the prediction dictionary. 69 Arguments: 70 - x (:obj:`torch.Tensor`): Tensor containing input embedding. 71 Returns: 72 - outputs (:obj:`Dict`): Dict containing keyword ``logit`` (:obj:`torch.Tensor`). 73 Shapes: 74 - x: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``. 75 - logit: :math:`(B, M)`, where ``M = output_size``. 76 Examples: 77 >>> head = DiscreteHead(64, 64) 78 >>> inputs = torch.randn(4, 64) 79 >>> outputs = head(inputs) 80 >>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 64]) 81 """ 82 logit = self.Q(x) 83 return {'logit': logit} 84 85 86class DistributionHead(nn.Module): 87 """ 88 Overview: 89 The ``DistributionHead`` is used to generate distribution for Q-value. 90 This module is used in C51 algorithm. 91 Interfaces: 92 ``__init__``, ``forward``. 93 """ 94 95 def __init__( 96 self, 97 hidden_size: int, 98 output_size: int, 99 layer_num: int = 1, 100 n_atom: int = 51, 101 v_min: float = -10, 102 v_max: float = 10, 103 activation: Optional[nn.Module] = nn.ReLU(), 104 norm_type: Optional[str] = None, 105 noise: Optional[bool] = False, 106 eps: Optional[float] = 1e-6, 107 ) -> None: 108 """ 109 Overview: 110 Init the ``DistributionHead`` layers according to the provided arguments. 111 Arguments: 112 - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``DistributionHead``. 113 - output_size (:obj:`int`): The number of outputs. 114 - layer_num (:obj:`int`): The number of layers used in the network to compute Q value distribution. 115 - n_atom (:obj:`int`): The number of atoms (discrete supports). Default is ``51``. 116 - v_min (:obj:`int`): Min value of atoms. Default is ``-10``. 117 - v_max (:obj:`int`): Max value of atoms. Default is ``10``. 118 - activation (:obj:`nn.Module`): The type of activation function to use in MLP. \ 119 If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``. 120 - norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \ 121 for more details. Default ``None``. 122 - noise (:obj:`bool`): Whether use ``NoiseLinearLayer`` as ``layer_fn`` in Q networks' MLP. \ 123 Default ``False``. 124 - eps (:obj:`float`): Small constant used for numerical stability. 125 """ 126 super(DistributionHead, self).__init__() 127 layer = NoiseLinearLayer if noise else nn.Linear 128 block = noise_block if noise else fc_block 129 self.Q = nn.Sequential( 130 MLP( 131 hidden_size, 132 hidden_size, 133 hidden_size, 134 layer_num, 135 layer_fn=layer, 136 activation=activation, 137 norm_type=norm_type 138 ), block(hidden_size, output_size * n_atom) 139 ) 140 self.output_size = output_size 141 self.n_atom = n_atom 142 self.v_min = v_min 143 self.v_max = v_max 144 self.eps = eps # for numerical stability 145 146 def forward(self, x: torch.Tensor) -> Dict: 147 """ 148 Overview: 149 Use encoded embedding tensor to run MLP with ``DistributionHead`` and return the prediction dictionary. 150 Arguments: 151 - x (:obj:`torch.Tensor`): Tensor containing input embedding. 152 Returns: 153 - outputs (:obj:`Dict`): Dict containing keywords ``logit`` (:obj:`torch.Tensor`) and \ 154 ``distribution`` (:obj:`torch.Tensor`). 155 Shapes: 156 - x: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``. 157 - logit: :math:`(B, M)`, where ``M = output_size``. 158 - distribution: :math:`(B, M, n_atom)`. 159 Examples: 160 >>> head = DistributionHead(64, 64) 161 >>> inputs = torch.randn(4, 64) 162 >>> outputs = head(inputs) 163 >>> assert isinstance(outputs, dict) 164 >>> assert outputs['logit'].shape == torch.Size([4, 64]) 165 >>> # default n_atom is 51 166 >>> assert outputs['distribution'].shape == torch.Size([4, 64, 51]) 167 """ 168 q = self.Q(x) 169 q = q.view(*q.shape[:-1], self.output_size, self.n_atom) 170 dist = torch.softmax(q, dim=-1) + self.eps 171 q = dist * torch.linspace(self.v_min, self.v_max, self.n_atom).to(x) 172 q = q.sum(-1) 173 return {'logit': q, 'distribution': dist} 174 175 176class BranchingHead(nn.Module): 177 """ 178 Overview: 179 The ``BranchingHead`` is used to generate Q-value with different branches. 180 This module is used in Branch DQN. 181 Interfaces: 182 ``__init__``, ``forward``. 183 """ 184 185 def __init__( 186 self, 187 hidden_size: int, 188 num_branches: int = 0, 189 action_bins_per_branch: int = 2, 190 layer_num: int = 1, 191 a_layer_num: Optional[int] = None, 192 v_layer_num: Optional[int] = None, 193 norm_type: Optional[str] = None, 194 activation: Optional[nn.Module] = nn.ReLU(), 195 noise: Optional[bool] = False, 196 ) -> None: 197 """ 198 Overview: 199 Init the ``BranchingHead`` layers according to the provided arguments. \ 200 This head achieves a linear increase of the number of network outputs \ 201 with the number of degrees of freedom by allowing a level of independence for each individual action. 202 Therefore, this head is suitable for high dimensional action Spaces. 203 Arguments: 204 - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``BranchingHead``. 205 - num_branches (:obj:`int`): The number of branches, which is equivalent to the action dimension. 206 - action_bins_per_branch (:obj:int): The number of action bins in each dimension. 207 - layer_num (:obj:`int`): The number of layers used in the network to compute Advantage and Value output. 208 - a_layer_num (:obj:`int`): The number of layers used in the network to compute Advantage output. 209 - v_layer_num (:obj:`int`): The number of layers used in the network to compute Value output. 210 - output_size (:obj:`int`): The number of outputs. 211 - norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \ 212 for more details. Default ``None``. 213 - activation (:obj:`nn.Module`): The type of activation function to use in MLP. \ 214 If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``. 215 - noise (:obj:`bool`): Whether use ``NoiseLinearLayer`` as ``layer_fn`` in Q networks' MLP. \ 216 Default ``False``. 217 """ 218 super(BranchingHead, self).__init__() 219 if a_layer_num is None: 220 a_layer_num = layer_num 221 if v_layer_num is None: 222 v_layer_num = layer_num 223 self.num_branches = num_branches 224 self.action_bins_per_branch = action_bins_per_branch 225 226 layer = NoiseLinearLayer if noise else nn.Linear 227 block = noise_block if noise else fc_block 228 # value network 229 230 self.V = nn.Sequential( 231 MLP( 232 hidden_size, 233 hidden_size, 234 hidden_size, 235 v_layer_num, 236 layer_fn=layer, 237 activation=activation, 238 norm_type=norm_type 239 ), block(hidden_size, 1) 240 ) 241 # action branching network 242 action_output_dim = action_bins_per_branch 243 self.branches = nn.ModuleList( 244 [ 245 nn.Sequential( 246 MLP( 247 hidden_size, 248 hidden_size, 249 hidden_size, 250 a_layer_num, 251 layer_fn=layer, 252 activation=activation, 253 norm_type=norm_type 254 ), block(hidden_size, action_output_dim) 255 ) for _ in range(self.num_branches) 256 ] 257 ) 258 259 def forward(self, x: torch.Tensor) -> Dict: 260 """ 261 Overview: 262 Use encoded embedding tensor to run MLP with ``BranchingHead`` and return the prediction dictionary. 263 Arguments: 264 - x (:obj:`torch.Tensor`): Tensor containing input embedding. 265 Returns: 266 - outputs (:obj:`Dict`): Dict containing keyword ``logit`` (:obj:`torch.Tensor`). 267 Shapes: 268 - x: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``. 269 - logit: :math:`(B, M)`, where ``M = output_size``. 270 Examples: 271 >>> head = BranchingHead(64, 5, 2) 272 >>> inputs = torch.randn(4, 64) 273 >>> outputs = head(inputs) 274 >>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 5, 2]) 275 """ 276 value_out = self.V(x) 277 value_out = torch.unsqueeze(value_out, 1) 278 action_out = [] 279 for b in self.branches: 280 action_out.append(b(x)) 281 action_scores = torch.stack(action_out, 1) 282 # From the paper, this implementation performs better than both the naive alternative (Q = V + A) \ 283 # and the local maximum reduction method (Q = V + max(A)). 284 action_scores = action_scores - torch.mean(action_scores, 2, keepdim=True) 285 logits = value_out + action_scores 286 return {'logit': logits} 287 288 289class RainbowHead(nn.Module): 290 """ 291 Overview: 292 The ``RainbowHead`` is used to generate distribution of Q-value. 293 This module is used in Rainbow DQN. 294 Interfaces: 295 ``__init__``, ``forward``. 296 """ 297 298 def __init__( 299 self, 300 hidden_size: int, 301 output_size: int, 302 layer_num: int = 1, 303 n_atom: int = 51, 304 v_min: float = -10, 305 v_max: float = 10, 306 activation: Optional[nn.Module] = nn.ReLU(), 307 norm_type: Optional[str] = None, 308 noise: Optional[bool] = True, 309 eps: Optional[float] = 1e-6, 310 ) -> None: 311 """ 312 Overview: 313 Init the ``RainbowHead`` layers according to the provided arguments. 314 Arguments: 315 - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``RainbowHead``. 316 - output_size (:obj:`int`): The number of outputs. 317 - layer_num (:obj:`int`): The number of layers used in the network to compute Q value output. 318 - n_atom (:obj:`int`): The number of atoms (discrete supports). Default is ``51``. 319 - v_min (:obj:`int`): Min value of atoms. Default is ``-10``. 320 - v_max (:obj:`int`): Max value of atoms. Default is ``10``. 321 - activation (:obj:`nn.Module`): The type of activation function to use in MLP. \ 322 If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``. 323 - norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \ 324 for more details. Default ``None``. 325 - noise (:obj:`bool`): Whether use ``NoiseLinearLayer`` as ``layer_fn`` in Q networks' MLP. \ 326 Default ``False``. 327 - eps (:obj:`float`): Small constant used for numerical stability. 328 """ 329 super(RainbowHead, self).__init__() 330 layer = NoiseLinearLayer if noise else nn.Linear 331 block = noise_block if noise else fc_block 332 self.A = nn.Sequential( 333 MLP( 334 hidden_size, 335 hidden_size, 336 hidden_size, 337 layer_num, 338 layer_fn=layer, 339 activation=activation, 340 norm_type=norm_type 341 ), block(hidden_size, output_size * n_atom) 342 ) 343 self.Q = nn.Sequential( 344 MLP( 345 hidden_size, 346 hidden_size, 347 hidden_size, 348 layer_num, 349 layer_fn=layer, 350 activation=activation, 351 norm_type=norm_type 352 ), block(hidden_size, n_atom) 353 ) 354 self.output_size = output_size 355 self.n_atom = n_atom 356 self.v_min = v_min 357 self.v_max = v_max 358 self.eps = eps 359 360 def forward(self, x: torch.Tensor) -> Dict: 361 """ 362 Overview: 363 Use encoded embedding tensor to run MLP with ``RainbowHead`` and return the prediction dictionary. 364 Arguments: 365 - x (:obj:`torch.Tensor`): Tensor containing input embedding. 366 Returns: 367 - outputs (:obj:`Dict`): Dict containing keywords ``logit`` (:obj:`torch.Tensor`) and \ 368 ``distribution`` (:obj:`torch.Tensor`). 369 Shapes: 370 - x: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``. 371 - logit: :math:`(B, M)`, where ``M = output_size``. 372 - distribution: :math:`(B, M, n_atom)`. 373 Examples: 374 >>> head = RainbowHead(64, 64) 375 >>> inputs = torch.randn(4, 64) 376 >>> outputs = head(inputs) 377 >>> assert isinstance(outputs, dict) 378 >>> assert outputs['logit'].shape == torch.Size([4, 64]) 379 >>> # default n_atom is 51 380 >>> assert outputs['distribution'].shape == torch.Size([4, 64, 51]) 381 """ 382 a = self.A(x) 383 q = self.Q(x) 384 a = a.view(*a.shape[:-1], self.output_size, self.n_atom) 385 q = q.view(*q.shape[:-1], 1, self.n_atom) 386 q = q + a - a.mean(dim=-2, keepdim=True) 387 dist = torch.softmax(q, dim=-1) + self.eps 388 q = dist * torch.linspace(self.v_min, self.v_max, self.n_atom).to(x) 389 q = q.sum(-1) 390 return {'logit': q, 'distribution': dist} 391 392 393class QRDQNHead(nn.Module): 394 """ 395 Overview: 396 The ``QRDQNHead`` (Quantile Regression DQN) is used to output action quantiles. 397 Interfaces: 398 ``__init__``, ``forward``. 399 """ 400 401 def __init__( 402 self, 403 hidden_size: int, 404 output_size: int, 405 layer_num: int = 1, 406 num_quantiles: int = 32, 407 activation: Optional[nn.Module] = nn.ReLU(), 408 norm_type: Optional[str] = None, 409 noise: Optional[bool] = False, 410 ) -> None: 411 """ 412 Overview: 413 Init the ``QRDQNHead`` layers according to the provided arguments. 414 Arguments: 415 - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``QRDQNHead``. 416 - output_size (:obj:`int`): The number of outputs. 417 - layer_num (:obj:`int`): The number of layers used in the network to compute Q value output. 418 - num_quantiles (:obj:`int`): The number of quantiles. Default is ``32``. 419 - activation (:obj:`nn.Module`): The type of activation function to use in MLP. \ 420 If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``. 421 - norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \ 422 for more details. Default ``None``. 423 - noise (:obj:`bool`): Whether use ``NoiseLinearLayer`` as ``layer_fn`` in Q networks' MLP. \ 424 Default ``False``. 425 """ 426 super(QRDQNHead, self).__init__() 427 layer = NoiseLinearLayer if noise else nn.Linear 428 block = noise_block if noise else fc_block 429 self.Q = nn.Sequential( 430 MLP( 431 hidden_size, 432 hidden_size, 433 hidden_size, 434 layer_num, 435 layer_fn=layer, 436 activation=activation, 437 norm_type=norm_type 438 ), block(hidden_size, output_size * num_quantiles) 439 ) 440 self.num_quantiles = num_quantiles 441 self.output_size = output_size 442 443 def forward(self, x: torch.Tensor) -> Dict: 444 """ 445 Overview: 446 Use encoded embedding tensor to run MLP with ``QRDQNHead`` and return the prediction dictionary. 447 Arguments: 448 - x (:obj:`torch.Tensor`): Tensor containing input embedding. 449 Returns: 450 - outputs (:obj:`Dict`): Dict containing keywords ``logit`` (:obj:`torch.Tensor`), \ 451 ``q`` (:obj:`torch.Tensor`), and ``tau`` (:obj:`torch.Tensor`). 452 Shapes: 453 - x: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``. 454 - logit: :math:`(B, M)`, where ``M = output_size``. 455 - q: :math:`(B, M, num_quantiles)`. 456 - tau: :math:`(B, M, 1)`. 457 Examples: 458 >>> head = QRDQNHead(64, 64) 459 >>> inputs = torch.randn(4, 64) 460 >>> outputs = head(inputs) 461 >>> assert isinstance(outputs, dict) 462 >>> assert outputs['logit'].shape == torch.Size([4, 64]) 463 >>> # default num_quantiles is 32 464 >>> assert outputs['q'].shape == torch.Size([4, 64, 32]) 465 >>> assert outputs['tau'].shape == torch.Size([4, 32, 1]) 466 """ 467 q = self.Q(x) 468 q = q.view(*q.shape[:-1], self.output_size, self.num_quantiles) 469 470 logit = q.mean(-1) 471 tau = torch.linspace(0, 1, self.num_quantiles + 1) 472 tau = ((tau[:-1] + tau[1:]) / 2).view(1, -1, 1).repeat(q.shape[0], 1, 1).to(q) 473 return {'logit': logit, 'q': q, 'tau': tau} 474 475 476class QuantileHead(nn.Module): 477 """ 478 Overview: 479 The ``QuantileHead`` is used to output action quantiles. 480 This module is used in IQN. 481 Interfaces: 482 ``__init__``, ``forward``, ``quantile_net``. 483 484 .. note:: 485 The difference between ``QuantileHead`` and ``QRDQNHead`` is that ``QuantileHead`` models the \ 486 state-action quantile function as a mapping from state-actions and samples from some base distribution \ 487 while ``QRDQNHead`` approximates random returns by a uniform mixture of Diracs functions. 488 """ 489 490 def __init__( 491 self, 492 hidden_size: int, 493 output_size: int, 494 layer_num: int = 1, 495 num_quantiles: int = 32, 496 quantile_embedding_size: int = 128, 497 beta_function_type: Optional[str] = 'uniform', 498 activation: Optional[nn.Module] = nn.ReLU(), 499 norm_type: Optional[str] = None, 500 noise: Optional[bool] = False, 501 ) -> None: 502 """ 503 Overview: 504 Init the ``QuantileHead`` layers according to the provided arguments. 505 Arguments: 506 - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``QuantileHead``. 507 - output_size (:obj:`int`): The number of outputs. 508 - layer_num (:obj:`int`): The number of layers used in the network to compute Q value output. 509 - num_quantiles (:obj:`int`): The number of quantiles. 510 - quantile_embedding_size (:obj:`int`): The embedding size of a quantile. 511 - beta_function_type (:obj:`str`): Type of beta function. See ``ding.rl_utils.beta_function.py`` \ 512 for more details. Default is ``uniform``. 513 - activation (:obj:`nn.Module`): The type of activation function to use in MLP. \ 514 If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``. 515 - norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \ 516 for more details. Default ``None``. 517 - noise (:obj:`bool`): Whether use ``NoiseLinearLayer`` as ``layer_fn`` in Q networks' MLP. \ 518 Default ``False``. 519 """ 520 super(QuantileHead, self).__init__() 521 layer = NoiseLinearLayer if noise else nn.Linear 522 block = noise_block if noise else fc_block 523 self.Q = nn.Sequential( 524 MLP( 525 hidden_size, 526 hidden_size, 527 hidden_size, 528 layer_num, 529 layer_fn=layer, 530 activation=activation, 531 norm_type=norm_type 532 ), block(hidden_size, output_size) 533 ) 534 self.num_quantiles = num_quantiles 535 self.quantile_embedding_size = quantile_embedding_size 536 self.output_size = output_size 537 self.iqn_fc = nn.Linear(self.quantile_embedding_size, hidden_size) 538 self.beta_function = beta_function_map[beta_function_type] 539 540 def quantile_net(self, quantiles: torch.Tensor) -> torch.Tensor: 541 """ 542 Overview: 543 Deterministic parametric function trained to reparameterize samples from a base distribution. \ 544 By repeated Bellman update iterations of Q-learning, the optimal action-value function is estimated. 545 Arguments: 546 - x (:obj:`torch.Tensor`): The encoded embedding tensor of parametric sample. 547 Returns: 548 - quantile_net (:obj:`torch.Tensor`): Quantile network output tensor after reparameterization. 549 Shapes: 550 - quantile_net :math:`(quantile_embedding_size, M)`, where ``M = output_size``. 551 Examples: 552 >>> head = QuantileHead(64, 64) 553 >>> quantiles = torch.randn(128,1) 554 >>> qn_output = head.quantile_net(quantiles) 555 >>> assert isinstance(qn_output, torch.Tensor) 556 >>> # default quantile_embedding_size: int = 128, 557 >>> assert qn_output.shape == torch.Size([128, 64]) 558 """ 559 quantile_net = quantiles.repeat([1, self.quantile_embedding_size]) 560 quantile_net = torch.cos( 561 torch.arange(1, self.quantile_embedding_size + 1, 1).to(quantiles) * math.pi * quantile_net 562 ) 563 quantile_net = self.iqn_fc(quantile_net) 564 quantile_net = F.relu(quantile_net) 565 return quantile_net 566 567 def forward(self, x: torch.Tensor, num_quantiles: Optional[int] = None) -> Dict: 568 """ 569 Overview: 570 Use encoded embedding tensor to run MLP with ``QuantileHead`` and return the prediction dictionary. 571 Arguments: 572 - x (:obj:`torch.Tensor`): Tensor containing input embedding. 573 Returns: 574 - outputs (:obj:`Dict`): Dict containing keywords ``logit`` (:obj:`torch.Tensor`), \ 575 ``q`` (:obj:`torch.Tensor`), and ``quantiles`` (:obj:`torch.Tensor`). 576 Shapes: 577 - x: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``. 578 - logit: :math:`(B, M)`, where ``M = output_size``. 579 - q: :math:`(num_quantiles, B, M)`. 580 - quantiles: :math:`(quantile_embedding_size, 1)`. 581 Examples: 582 >>> head = QuantileHead(64, 64) 583 >>> inputs = torch.randn(4, 64) 584 >>> outputs = head(inputs) 585 >>> assert isinstance(outputs, dict) 586 >>> assert outputs['logit'].shape == torch.Size([4, 64]) 587 >>> # default num_quantiles is 32 588 >>> assert outputs['q'].shape == torch.Size([32, 4, 64]) 589 >>> assert outputs['quantiles'].shape == torch.Size([128, 1]) 590 """ 591 592 if num_quantiles is None: 593 num_quantiles = self.num_quantiles 594 batch_size = x.shape[0] 595 596 q_quantiles = torch.FloatTensor(num_quantiles * batch_size, 1).uniform_(0, 1).to(x) 597 logit_quantiles = torch.FloatTensor(num_quantiles * batch_size, 1).uniform_(0, 1).to(x) 598 logit_quantiles = self.beta_function(logit_quantiles) 599 q_quantile_net = self.quantile_net(q_quantiles) 600 logit_quantile_net = self.quantile_net(logit_quantiles) 601 602 x = x.repeat(num_quantiles, 1) 603 q_x = x * q_quantile_net # 4*32,64 604 logit_x = x * logit_quantile_net 605 606 q = self.Q(q_x).reshape(num_quantiles, batch_size, -1) 607 logit = self.Q(logit_x).reshape(num_quantiles, batch_size, -1).mean(0) 608 609 return {'logit': logit, 'q': q, 'quantiles': q_quantiles} 610 611 612class FQFHead(nn.Module): 613 """ 614 Overview: 615 The ``FQFHead`` is used to output action quantiles. 616 This module is used in FQF. 617 Interfaces: 618 ``__init__``, ``forward``, ``quantile_net``. 619 620 .. note:: 621 The implementation of FQFHead is based on the paper https://arxiv.org/abs/1911.02140. 622 The difference between FQFHead and QuantileHead is that, in FQF, \ 623 N adjustable quantile values for N adjustable quantile fractions are estimated to approximate \ 624 the quantile function. The distribution of the return is approximated by a weighted mixture of N \ 625 Diracs functions. While in IQN, the state-action quantile function is modeled as a mapping from \ 626 state-actions and samples from some base distribution. 627 """ 628 629 def __init__( 630 self, 631 hidden_size: int, 632 output_size: int, 633 layer_num: int = 1, 634 num_quantiles: int = 32, 635 quantile_embedding_size: int = 128, 636 activation: Optional[nn.Module] = nn.ReLU(), 637 norm_type: Optional[str] = None, 638 noise: Optional[bool] = False, 639 ) -> None: 640 """ 641 Overview: 642 Init the ``FQFHead`` layers according to the provided arguments. 643 Arguments: 644 - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``FQFHead``. 645 - output_size (:obj:`int`): The number of outputs. 646 - layer_num (:obj:`int`): The number of layers used in the network to compute Q value output. 647 - num_quantiles (:obj:`int`): The number of quantiles. 648 - quantile_embedding_size (:obj:`int`): The embedding size of a quantile. 649 - activation (:obj:`nn.Module`): The type of activation function to use in MLP. \ 650 If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``. 651 - norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \ 652 for more details. Default ``None``. 653 - noise (:obj:`bool`): Whether use ``NoiseLinearLayer`` as ``layer_fn`` in Q networks' MLP. \ 654 Default ``False``. 655 """ 656 super(FQFHead, self).__init__() 657 layer = NoiseLinearLayer if noise else nn.Linear 658 block = noise_block if noise else fc_block 659 self.Q = nn.Sequential( 660 MLP( 661 hidden_size, 662 hidden_size, 663 hidden_size, 664 layer_num, 665 layer_fn=layer, 666 activation=activation, 667 norm_type=norm_type 668 ), block(hidden_size, output_size) 669 ) 670 self.num_quantiles = num_quantiles 671 self.quantile_embedding_size = quantile_embedding_size 672 self.output_size = output_size 673 self.fqf_fc = nn.Sequential(nn.Linear(self.quantile_embedding_size, hidden_size), nn.ReLU()) 674 self.register_buffer( 675 'sigma_pi', 676 torch.arange(1, self.quantile_embedding_size + 1, 1).view(1, 1, self.quantile_embedding_size) * math.pi 677 ) 678 # initialize weights_xavier of quantiles_proposal network 679 # NOTE(rjy): quantiles_proposal network mean fraction proposal network 680 quantiles_proposal_fc = nn.Linear(hidden_size, num_quantiles) 681 torch.nn.init.xavier_uniform_(quantiles_proposal_fc.weight, gain=0.01) 682 torch.nn.init.constant_(quantiles_proposal_fc.bias, 0) 683 self.quantiles_proposal = nn.Sequential(quantiles_proposal_fc, nn.LogSoftmax(dim=1)) 684 685 def quantile_net(self, quantiles: torch.Tensor) -> torch.Tensor: 686 """ 687 Overview: 688 Deterministic parametric function trained to reparameterize samples from the quantiles_proposal network. \ 689 By repeated Bellman update iterations of Q-learning, the optimal action-value function is estimated. 690 Arguments: 691 - x (:obj:`torch.Tensor`): The encoded embedding tensor of parametric sample. 692 Returns: 693 - quantile_net (:obj:`torch.Tensor`): Quantile network output tensor after reparameterization. 694 Examples: 695 >>> head = FQFHead(64, 64) 696 >>> quantiles = torch.randn(4,32) 697 >>> qn_output = head.quantile_net(quantiles) 698 >>> assert isinstance(qn_output, torch.Tensor) 699 >>> # default quantile_embedding_size: int = 128, 700 >>> assert qn_output.shape == torch.Size([4, 32, 64]) 701 """ 702 batch_size, num_quantiles = quantiles.shape[:2] 703 quantile_net = torch.cos(self.sigma_pi.to(quantiles) * quantiles.view(batch_size, num_quantiles, 1)) 704 quantile_net = self.fqf_fc(quantile_net) # (batch_size, num_quantiles, hidden_size) 705 return quantile_net 706 707 def forward(self, x: torch.Tensor, num_quantiles: Optional[int] = None) -> Dict: 708 """ 709 Overview: 710 Use encoded embedding tensor to run MLP with ``FQFHead`` and return the prediction dictionary. 711 Arguments: 712 - x (:obj:`torch.Tensor`): Tensor containing input embedding. 713 Returns: 714 - outputs (:obj:`Dict`): Dict containing keywords ``logit`` (:obj:`torch.Tensor`), \ 715 ``q`` (:obj:`torch.Tensor`), ``quantiles`` (:obj:`torch.Tensor`), \ 716 ``quantiles_hats`` (:obj:`torch.Tensor`), \ 717 ``q_tau_i`` (:obj:`torch.Tensor`), ``entropies`` (:obj:`torch.Tensor`). 718 Shapes: 719 - x: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``. 720 - logit: :math:`(B, M)`, where ``M = output_size``. 721 - q: :math:`(B, num_quantiles, M)`. 722 - quantiles: :math:`(B, num_quantiles + 1)`. 723 - quantiles_hats: :math:`(B, num_quantiles)`. 724 - q_tau_i: :math:`(B, num_quantiles - 1, M)`. 725 - entropies: :math:`(B, 1)`. 726 Examples: 727 >>> head = FQFHead(64, 64) 728 >>> inputs = torch.randn(4, 64) 729 >>> outputs = head(inputs) 730 >>> assert isinstance(outputs, dict) 731 >>> assert outputs['logit'].shape == torch.Size([4, 64]) 732 >>> # default num_quantiles is 32 733 >>> assert outputs['q'].shape == torch.Size([4, 32, 64]) 734 >>> assert outputs['quantiles'].shape == torch.Size([4, 33]) 735 >>> assert outputs['quantiles_hats'].shape == torch.Size([4, 32]) 736 >>> assert outputs['q_tau_i'].shape == torch.Size([4, 31, 64]) 737 >>> assert outputs['quantiles'].shape == torch.Size([4, 1]) 738 """ 739 740 if num_quantiles is None: 741 num_quantiles = self.num_quantiles 742 batch_size = x.shape[0] 743 744 log_q_quantiles = self.quantiles_proposal( 745 x.detach() 746 ) # (batch_size, num_quantiles), not to update encoder when learning w1_loss(fraction loss) 747 q_quantiles = log_q_quantiles.exp() # NOTE(rjy): e^log_q = q 748 749 # Calculate entropies of value distributions. 750 entropies = -(log_q_quantiles * q_quantiles).sum(dim=-1, keepdim=True) # (batch_size, 1) 751 assert entropies.shape == (batch_size, 1) 752 753 # accumalative softmax 754 # NOTE(rjy): because quantiles are still expressed in the form of their respective proportions, 755 # e.g. [0.33, 0.33, 0.33] => [0.33, 0.66, 0.99] 756 q_quantiles = torch.cumsum(q_quantiles, dim=1) 757 758 # quantile_hats: find the optimal condition for Ï„ to minimize W1(Z, Ï„) 759 tau_0 = torch.zeros((batch_size, 1)).to(x) 760 q_quantiles = torch.cat((tau_0, q_quantiles), dim=1) # [batch_size, num_quantiles+1] 761 762 # NOTE(rjy): theta_i = F^(-1)_Z((tau_i+tau_i+1)/2), Ï„^ = (tau_i+tau_i+1)/2, q_quantiles_hats is Ï„^ 763 q_quantiles_hats = (q_quantiles[:, 1:] + q_quantiles[:, :-1]).detach() / 2. # (batch_size, num_quantiles) 764 765 # NOTE(rjy): reparameterize q_quantiles_hats 766 q_quantile_net = self.quantile_net(q_quantiles_hats) # [batch_size, num_quantiles, hidden_size(64)] 767 # x.view[batch_size, 1, hidden_size(64)] 768 q_x = (x.view(batch_size, 1, -1) * q_quantile_net) # [batch_size, num_quantiles, hidden_size(64)] 769 770 q = self.Q(q_x) # [batch_size, num_quantiles, action_dim(64)] 771 772 logit = q.mean(1) 773 with torch.no_grad(): 774 q_tau_i_net = self.quantile_net( 775 q_quantiles[:, 1:-1].detach() 776 ) # [batch_size, num_quantiles-1, hidden_size(64)] 777 q_tau_i_x = (x.view(batch_size, 1, -1) * q_tau_i_net) # [batch_size, (num_quantiles-1), hidden_size(64)] 778 779 q_tau_i = self.Q(q_tau_i_x) # [batch_size, num_quantiles-1, action_dim] 780 781 return { 782 'logit': logit, 783 'q': q, 784 'quantiles': q_quantiles, 785 'quantiles_hats': q_quantiles_hats, 786 'q_tau_i': q_tau_i, 787 'entropies': entropies 788 } 789 790 791class DuelingHead(nn.Module): 792 """ 793 Overview: 794 The ``DuelingHead`` is used to output discrete actions logit. 795 This module is used in Dueling DQN. 796 Interfaces: 797 ``__init__``, ``forward``. 798 """ 799 800 def __init__( 801 self, 802 hidden_size: int, 803 output_size: int, 804 layer_num: int = 1, 805 a_layer_num: Optional[int] = None, 806 v_layer_num: Optional[int] = None, 807 activation: Optional[nn.Module] = nn.ReLU(), 808 norm_type: Optional[str] = None, 809 dropout: Optional[float] = None, 810 noise: Optional[bool] = False, 811 ) -> None: 812 """ 813 Overview: 814 Init the ``DuelingHead`` layers according to the provided arguments. 815 Arguments: 816 - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``DuelingHead``. 817 - output_size (:obj:`int`): The number of outputs. 818 - a_layer_num (:obj:`int`): The number of layers used in the network to compute action output. 819 - v_layer_num (:obj:`int`): The number of layers used in the network to compute value output. 820 - activation (:obj:`nn.Module`): The type of activation function to use in MLP. \ 821 If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``. 822 - norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \ 823 for more details. Default ``None``. 824 - dropout (:obj:`float`): The dropout rate of dropout layer. Default ``None``. 825 - noise (:obj:`bool`): Whether use ``NoiseLinearLayer`` as ``layer_fn`` in Q networks' MLP. \ 826 Default ``False``. 827 """ 828 super(DuelingHead, self).__init__() 829 if a_layer_num is None: 830 a_layer_num = layer_num 831 if v_layer_num is None: 832 v_layer_num = layer_num 833 layer = NoiseLinearLayer if noise else nn.Linear 834 block = noise_block if noise else fc_block 835 self.A = nn.Sequential( 836 MLP( 837 hidden_size, 838 hidden_size, 839 hidden_size, 840 a_layer_num, 841 layer_fn=layer, 842 activation=activation, 843 use_dropout=dropout is not None, 844 dropout_probability=dropout, 845 norm_type=norm_type 846 ), block(hidden_size, output_size) 847 ) 848 self.V = nn.Sequential( 849 MLP( 850 hidden_size, 851 hidden_size, 852 hidden_size, 853 v_layer_num, 854 layer_fn=layer, 855 activation=activation, 856 use_dropout=dropout is not None, 857 dropout_probability=dropout, 858 norm_type=norm_type 859 ), block(hidden_size, 1) 860 ) 861 862 def forward(self, x: torch.Tensor) -> Dict: 863 """ 864 Overview: 865 Use encoded embedding tensor to run MLP with ``DuelingHead`` and return the prediction dictionary. 866 Arguments: 867 - x (:obj:`torch.Tensor`): Tensor containing input embedding. 868 Returns: 869 - outputs (:obj:`Dict`): Dict containing keyword ``logit`` (:obj:`torch.Tensor`). 870 Shapes: 871 - x: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``. 872 - logit: :math:`(B, M)`, where ``M = output_size``. 873 Examples: 874 >>> head = DuelingHead(64, 64) 875 >>> inputs = torch.randn(4, 64) 876 >>> outputs = head(inputs) 877 >>> assert isinstance(outputs, dict) 878 >>> assert outputs['logit'].shape == torch.Size([4, 64]) 879 """ 880 a = self.A(x) 881 v = self.V(x) 882 q_value = a - a.mean(dim=-1, keepdim=True) + v 883 return {'logit': q_value} 884 885 886class StochasticDuelingHead(nn.Module): 887 """ 888 Overview: 889 The ``Stochastic Dueling Network`` is proposed in paper ACER (arxiv 1611.01224). \ 890 That is to say, dueling network architecture in continuous action space. 891 Interfaces: 892 ``__init__``, ``forward``. 893 """ 894 895 def __init__( 896 self, 897 hidden_size: int, 898 action_shape: int, 899 layer_num: int = 1, 900 a_layer_num: Optional[int] = None, 901 v_layer_num: Optional[int] = None, 902 activation: Optional[nn.Module] = nn.ReLU(), 903 norm_type: Optional[str] = None, 904 noise: Optional[bool] = False, 905 last_tanh: Optional[bool] = True, 906 ) -> None: 907 """ 908 Overview: 909 Init the ``Stochastic DuelingHead`` layers according to the provided arguments. 910 Arguments: 911 - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``StochasticDuelingHead``. 912 - action_shape (:obj:`int`): The number of continuous action shape, usually integer value. 913 - layer_num (:obj:`int`): The number of default layers used in the network to compute action and value \ 914 output. 915 - a_layer_num (:obj:`int`): The number of layers used in the network to compute action output. Default is \ 916 ``layer_num``. 917 - v_layer_num (:obj:`int`): The number of layers used in the network to compute value output. Default is \ 918 ``layer_num``. 919 - activation (:obj:`nn.Module`): The type of activation function to use in MLP. \ 920 If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``. 921 - norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \ 922 for more details. Default ``None``. 923 - noise (:obj:`bool`): Whether use ``NoiseLinearLayer`` as ``layer_fn`` in Q networks' MLP. \ 924 Default ``False``. 925 - last_tanh (:obj:`bool`): If ``True`` Apply ``tanh`` to actions. Default ``True``. 926 """ 927 super(StochasticDuelingHead, self).__init__() 928 if a_layer_num is None: 929 a_layer_num = layer_num 930 if v_layer_num is None: 931 v_layer_num = layer_num 932 layer = NoiseLinearLayer if noise else nn.Linear 933 block = noise_block if noise else fc_block 934 self.A = nn.Sequential( 935 MLP( 936 hidden_size + action_shape, 937 hidden_size, 938 hidden_size, 939 a_layer_num, 940 layer_fn=layer, 941 activation=activation, 942 norm_type=norm_type 943 ), block(hidden_size, 1) 944 ) 945 self.V = nn.Sequential( 946 MLP( 947 hidden_size, 948 hidden_size, 949 hidden_size, 950 v_layer_num, 951 layer_fn=layer, 952 activation=activation, 953 norm_type=norm_type 954 ), block(hidden_size, 1) 955 ) 956 if last_tanh: 957 self.tanh = nn.Tanh() 958 else: 959 self.tanh = None 960 961 def forward( 962 self, 963 s: torch.Tensor, 964 a: torch.Tensor, 965 mu: torch.Tensor, 966 sigma: torch.Tensor, 967 sample_size: int = 10, 968 ) -> Dict[str, torch.Tensor]: 969 """ 970 Overview: 971 Use encoded embedding tensor to run MLP with ``StochasticDuelingHead`` and return the prediction dictionary. 972 Arguments: 973 - s (:obj:`torch.Tensor`): Tensor containing input embedding. 974 - a (:obj:`torch.Tensor`): The original continuous behaviour action. 975 - mu (:obj:`torch.Tensor`): The ``mu`` gaussian reparameterization output of actor head at current \ 976 timestep. 977 - sigma (:obj:`torch.Tensor`): The ``sigma`` gaussian reparameterization output of actor head at \ 978 current timestep. 979 - sample_size (:obj:`int`): The number of samples for continuous action when computing the Q value. 980 Returns: 981 - outputs (:obj:`Dict`): Dict containing keywords \ 982 ``q_value`` (:obj:`torch.Tensor`) and ``v_value`` (:obj:`torch.Tensor`). 983 Shapes: 984 - s: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``. 985 - a: :math:`(B, A)`, where ``A = action_size``. 986 - mu: :math:`(B, A)`. 987 - sigma: :math:`(B, A)`. 988 - q_value: :math:`(B, 1)`. 989 - v_value: :math:`(B, 1)`. 990 Examples: 991 >>> head = StochasticDuelingHead(64, 64) 992 >>> inputs = torch.randn(4, 64) 993 >>> a = torch.randn(4, 64) 994 >>> mu = torch.randn(4, 64) 995 >>> sigma = torch.ones(4, 64) 996 >>> outputs = head(inputs, a, mu, sigma) 997 >>> assert isinstance(outputs, dict) 998 >>> assert outputs['q_value'].shape == torch.Size([4, 1]) 999 >>> assert outputs['v_value'].shape == torch.Size([4, 1])1000 """10011002 batch_size = s.shape[0] # batch_size or batch_size * T1003 hidden_size = s.shape[1]1004 action_size = a.shape[1]1005 state_cat_action = torch.cat((s, a), dim=1) # size (B, action_size + state_size)1006 a_value = self.A(state_cat_action) # size (B, 1)1007 v_value = self.V(s) # size (B, 1)1008 # size (B, sample_size, hidden_size)1009 expand_s = (torch.unsqueeze(s, 1)).expand((batch_size, sample_size, hidden_size))10101011 # in case for gradient back propagation1012 dist = Independent(Normal(mu, sigma), 1)1013 action_sample = dist.rsample(sample_shape=(sample_size, ))1014 if self.tanh:1015 action_sample = self.tanh(action_sample)1016 # (sample_size, B, action_size)->(B, sample_size, action_size)1017 action_sample = action_sample.permute(1, 0, 2)10181019 # size (B, sample_size, action_size + hidden_size)1020 state_cat_action_sample = torch.cat((expand_s, action_sample), dim=-1)1021 a_val_sample = self.A(state_cat_action_sample) # size (B, sample_size, 1)1022 q_value = v_value + a_value - a_val_sample.mean(dim=1) # size (B, 1)10231024 return {'q_value': q_value, 'v_value': v_value}102510261027class RegressionHead(nn.Module):1028 """1029 Overview:1030 The ``RegressionHead`` is used to regress continuous variables.1031 This module is used for generating Q-value (DDPG critic) of continuous actions, \1032 or state value (A2C/PPO), or directly predicting continuous action (DDPG actor).1033 Interfaces:1034 ``__init__``, ``forward``.1035 """10361037 def __init__(1038 self,1039 input_size: int,1040 output_size: int,1041 layer_num: int = 2,1042 final_tanh: Optional[bool] = False,1043 activation: Optional[nn.Module] = nn.ReLU(),1044 norm_type: Optional[str] = None,1045 hidden_size: int = None,1046 ) -> None:1047 """1048 Overview:1049 Init the ``RegressionHead`` layers according to the provided arguments.1050 Arguments:1051 - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``RegressionHead``.1052 - output_size (:obj:`int`): The number of outputs.1053 - layer_num (:obj:`int`): The number of layers used in the network to compute Q value output.1054 - final_tanh (:obj:`bool`): If ``True`` apply ``tanh`` to output. Default ``False``.1055 - activation (:obj:`nn.Module`): The type of activation function to use in MLP. \1056 If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``.1057 - norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \1058 for more details. Default ``None``.1059 """1060 super(RegressionHead, self).__init__()1061 if hidden_size is None:1062 hidden_size = input_size1063 self.main = MLP(input_size, hidden_size, hidden_size, layer_num, activation=activation, norm_type=norm_type)1064 self.last = nn.Linear(hidden_size, output_size) # for convenience of special initialization1065 self.final_tanh = final_tanh1066 if self.final_tanh:1067 self.tanh = nn.Tanh()10681069 def forward(self, x: torch.Tensor) -> Dict:1070 """1071 Overview:1072 Use encoded embedding tensor to run MLP with ``RegressionHead`` and return the prediction dictionary.1073 Arguments:1074 - x (:obj:`torch.Tensor`): Tensor containing input embedding.1075 Returns:1076 - outputs (:obj:`Dict`): Dict containing keyword ``pred`` (:obj:`torch.Tensor`).1077 Shapes:1078 - x: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``.1079 - pred: :math:`(B, M)`, where ``M = output_size``.1080 Examples:1081 >>> head = RegressionHead(64, 64)1082 >>> inputs = torch.randn(4, 64)1083 >>> outputs = head(inputs)1084 >>> assert isinstance(outputs, dict)1085 >>> assert outputs['pred'].shape == torch.Size([4, 64])1086 """1087 x = self.main(x)1088 x = self.last(x)1089 if self.final_tanh:1090 x = self.tanh(x)1091 if x.shape[-1] == 1 and len(x.shape) > 1:1092 x = x.squeeze(-1)1093 return {'pred': x}109410951096class ReparameterizationHead(nn.Module):1097 """1098 Overview:1099 The ``ReparameterizationHead`` is used to generate Gaussian distribution of continuous variable, \1100 which is parameterized by ``mu`` and ``sigma``.1101 This module is often used in stochastic policies, such as PPO and SAC.1102 Interfaces:1103 ``__init__``, ``forward``.1104 """1105 # The "happo" type here is to align with the sigma initialization method of the network in the original HAPPO \1106 # paper. The code here needs to be optimized later.1107 default_sigma_type = ['fixed', 'independent', 'conditioned', 'happo']1108 default_bound_type = ['tanh', None]11091110 def __init__(1111 self,1112 input_size: int,1113 output_size: int,1114 layer_num: int = 2,1115 sigma_type: Optional[str] = None,1116 fixed_sigma_value: Optional[float] = 1.0,1117 activation: Optional[nn.Module] = nn.ReLU(),1118 norm_type: Optional[str] = None,1119 bound_type: Optional[str] = None,1120 hidden_size: int = None1121 ) -> None:1122 """1123 Overview:1124 Init the ``ReparameterizationHead`` layers according to the provided arguments.1125 Arguments:1126 - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``ReparameterizationHead``.1127 - output_size (:obj:`int`): The number of outputs.1128 - layer_num (:obj:`int`): The number of layers used in the network to compute Q value output.1129 - sigma_type (:obj:`str`): Sigma type used. Choose among \1130 ``['fixed', 'independent', 'conditioned']``. Default is ``None``.1131 - fixed_sigma_value (:obj:`float`): When choosing ``fixed`` type, the tensor ``output['sigma']`` \1132 is filled with this input value. Default is ``None``.1133 - activation (:obj:`nn.Module`): The type of activation function to use in MLP. \1134 If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``.1135 - norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \1136 for more details. Default ``None``.1137 - bound_type (:obj:`str`): Bound type to apply to output ``mu``. Choose among ``['tanh', None]``. \1138 Default is ``None``.1139 """1140 super(ReparameterizationHead, self).__init__()1141 if hidden_size is None:1142 hidden_size = input_size1143 self.sigma_type = sigma_type1144 assert sigma_type in self.default_sigma_type, "Please indicate sigma_type as one of {}".format(1145 self.default_sigma_type1146 )1147 self.bound_type = bound_type1148 assert bound_type in self.default_bound_type, "Please indicate bound_type as one of {}".format(1149 self.default_bound_type1150 )1151 self.main = MLP(input_size, hidden_size, hidden_size, layer_num, activation=activation, norm_type=norm_type)1152 self.mu = nn.Linear(hidden_size, output_size)1153 if self.sigma_type == 'fixed':1154 self.sigma = torch.full((1, output_size), fixed_sigma_value)1155 elif self.sigma_type == 'independent': # independent parameter1156 self.log_sigma_param = nn.Parameter(torch.zeros(1, output_size))1157 elif self.sigma_type == 'conditioned':1158 self.log_sigma_layer = nn.Linear(hidden_size, output_size)1159 elif self.sigma_type == 'happo':1160 self.sigma_x_coef = 1.1161 self.sigma_y_coef = 0.51162 # This parameter (x_coef, y_coef) refers to the HAPPO paper http://arxiv.org/abs/2109.11251.1163 self.log_sigma_param = nn.Parameter(torch.ones(1, output_size) * self.sigma_x_coef)11641165 def forward(self, x: torch.Tensor) -> Dict:1166 """1167 Overview:1168 Use encoded embedding tensor to run MLP with ``ReparameterizationHead`` and return the prediction \1169 dictionary.1170 Arguments:1171 - x (:obj:`torch.Tensor`): Tensor containing input embedding.1172 Returns:1173 - outputs (:obj:`Dict`): Dict containing keywords ``mu`` (:obj:`torch.Tensor`) and ``sigma`` \1174 (:obj:`torch.Tensor`).1175 Shapes:1176 - x: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``.1177 - mu: :math:`(B, M)`, where ``M = output_size``.1178 - sigma: :math:`(B, M)`.1179 Examples:1180 >>> head = ReparameterizationHead(64, 64, sigma_type='fixed')1181 >>> inputs = torch.randn(4, 64)1182 >>> outputs = head(inputs)1183 >>> assert isinstance(outputs, dict)1184 >>> assert outputs['mu'].shape == torch.Size([4, 64])1185 >>> assert outputs['sigma'].shape == torch.Size([4, 64])1186 """1187 x = self.main(x)1188 mu = self.mu(x)1189 if self.bound_type == 'tanh':1190 mu = torch.tanh(mu)1191 if self.sigma_type == 'fixed':1192 sigma = self.sigma.to(mu.device) + torch.zeros_like(mu) # addition aims to broadcast shape1193 elif self.sigma_type == 'independent':1194 log_sigma = self.log_sigma_param + torch.zeros_like(mu) # addition aims to broadcast shape1195 sigma = torch.exp(log_sigma)1196 elif self.sigma_type == 'conditioned':1197 log_sigma = self.log_sigma_layer(x)1198 sigma = torch.exp(torch.clamp(log_sigma, -20, 2))1199 elif self.sigma_type == 'happo':1200 log_sigma = self.log_sigma_param + torch.zeros_like(mu)1201 sigma = torch.sigmoid(log_sigma / self.sigma_x_coef) * self.sigma_y_coef1202 return {'mu': mu, 'sigma': sigma}120312041205class PopArtVHead(nn.Module):1206 """1207 Overview:1208 The ``PopArtVHead`` is used to generate adaptive normalized state value. More information can be found in \1209 paper Multi-task Deep Reinforcement Learning with PopArt. \1210 https://arxiv.org/abs/1809.04474 \1211 This module is used in PPO or IMPALA.1212 Interfaces:1213 ``__init__``, ``forward``.1214 """12151216 def __init__(1217 self,1218 hidden_size: int,1219 output_size: int,1220 layer_num: int = 1,1221 activation: Optional[nn.Module] = nn.ReLU(),1222 norm_type: Optional[str] = None,1223 ) -> None:1224 """1225 Overview:1226 Init the ``PopArtVHead`` layers according to the provided arguments.1227 Arguments:1228 - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``PopArtVHead``.1229 - output_size (:obj:`int`): The number of outputs.1230 - layer_num (:obj:`int`): The number of layers used in the network to compute Q value output.1231 - activation (:obj:`nn.Module`): The type of activation function to use in MLP. \1232 If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``.1233 - norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \1234 for more details. Default ``None``.1235 """1236 super(PopArtVHead, self).__init__()1237 self.popart = PopArt(hidden_size, output_size)1238 self.Q = nn.Sequential(1239 MLP(1240 hidden_size,1241 hidden_size,1242 hidden_size,1243 layer_num,1244 layer_fn=nn.Linear,1245 activation=activation,1246 norm_type=norm_type1247 ), self.popart1248 )12491250 def forward(self, x: torch.Tensor) -> Dict:1251 """1252 Overview:1253 Use encoded embedding tensor to run MLP with ``PopArtVHead`` and return the normalized prediction and \1254 the unnormalized prediction dictionary.1255 Arguments:1256 - x (:obj:`torch.Tensor`): Tensor containing input embedding.1257 Returns:1258 - outputs (:obj:`Dict`): Dict containing keyword ``pred`` (:obj:`torch.Tensor`) \1259 and ``unnormalized_pred`` (:obj:`torch.Tensor`).1260 Shapes:1261 - x: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``.1262 - logit: :math:`(B, M)`, where ``M = output_size``.1263 Examples:1264 >>> head = PopArtVHead(64, 64)1265 >>> inputs = torch.randn(4, 64)1266 >>> outputs = head(inputs)1267 >>> assert isinstance(outputs, dict) and outputs['pred'].shape == torch.Size([4, 64]) and \1268 outputs['unnormalized_pred'].shape == torch.Size([4, 64])1269 """1270 x = self.Q(x)1271 return x127212731274class AttentionPolicyHead(nn.Module):1275 """1276 Overview:1277 Cross-attention-type discrete action policy head, which is often used in variable discrete action space.1278 Interfaces:1279 ``__init__``, ``forward``.1280 """12811282 def __init__(self) -> None:1283 super(AttentionPolicyHead, self).__init__()12841285 def forward(self, key: torch.Tensor, query: torch.Tensor) -> torch.Tensor:1286 """1287 Overview:1288 Use attention-like mechanism to combine key and query tensor to output discrete action logit.1289 Arguments:1290 - key (:obj:`torch.Tensor`): Tensor containing key embedding.1291 - query (:obj:`torch.Tensor`): Tensor containing query embedding.1292 Returns:1293 - logit (:obj:`torch.Tensor`): Tensor containing output discrete action logit.1294 Shapes:1295 - key: :math:`(B, N, K)`, where ``B = batch_size``, ``N = possible discrete action choices`` and \1296 ``K = hidden_size``.1297 - query: :math:`(B, K)`.1298 - logit: :math:`(B, N)`.1299 Examples:1300 >>> head = AttentionPolicyHead()1301 >>> key = torch.randn(4, 5, 64)1302 >>> query = torch.randn(4, 64)1303 >>> logit = head(key, query)1304 >>> assert logit.shape == torch.Size([4, 5])13051306 .. note::1307 In this head, we assume that the ``key`` and ``query`` tensor are both normalized.1308 """1309 if len(query.shape) == 2 and len(key.shape) == 3:1310 query = query.unsqueeze(1)1311 logit = (key * query).sum(-1)1312 return logit131313141315class MultiHead(nn.Module):1316 """1317 Overview:1318 The ``MultiHead`` is used to generate multiple similar results.1319 For example, we can combine ``Distribution`` and ``MultiHead`` to generate multi-discrete action space logit.1320 Interfaces:1321 ``__init__``, ``forward``.1322 """13231324 def __init__(self, head_cls: type, hidden_size: int, output_size_list: SequenceType, **head_kwargs) -> None:1325 """1326 Overview:1327 Init the ``MultiHead`` layers according to the provided arguments.1328 Arguments:1329 - head_cls (:obj:`type`): The class of head, choose among [``DuelingHead``, ``DistributionHead``, \1330 ''QuatileHead'', ...].1331 - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to the ``Head``.1332 - output_size_list (:obj:`int`): Sequence of ``output_size`` for multi discrete action, e.g. ``[2, 3, 5]``.1333 - head_kwargs: (:obj:`dict`): Dict containing class-specific arguments.1334 """1335 super(MultiHead, self).__init__()1336 self.pred = nn.ModuleList()1337 for size in output_size_list:1338 self.pred.append(head_cls(hidden_size, size, **head_kwargs))13391340 def forward(self, x: torch.Tensor) -> Dict:1341 """1342 Overview:1343 Use encoded embedding tensor to run MLP with ``MultiHead`` and return the prediction dictionary.1344 Arguments:1345 - x (:obj:`torch.Tensor`): Tensor containing input embedding.1346 Returns:1347 - outputs (:obj:`Dict`): Dict containing keywords ``logit`` (:obj:`torch.Tensor`) \1348 corresponding to the logit of each ``output`` each accessed at ``['logit'][i]``.1349 Shapes:1350 - x: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``.1351 - logit: :math:`(B, Mi)`, where ``Mi = output_size`` corresponding to output ``i``.1352 Examples:1353 >>> head = MultiHead(DuelingHead, 64, [2, 3, 5], v_layer_num=2)1354 >>> inputs = torch.randn(4, 64)1355 >>> outputs = head(inputs)1356 >>> assert isinstance(outputs, dict)1357 >>> # output_size_list is [2, 3, 5] as set1358 >>> # Therefore each dim of logit is as follows1359 >>> outputs['logit'][0].shape1360 >>> torch.Size([4, 2])1361 >>> outputs['logit'][1].shape1362 >>> torch.Size([4, 3])1363 >>> outputs['logit'][2].shape1364 >>> torch.Size([4, 5])1365 """1366 return lists_to_dicts([m(x) for m in self.pred])136713681369class EnsembleHead(nn.Module):1370 """1371 Overview:1372 The ``EnsembleHead`` is used to generate Q-value for Q-ensemble in model-based RL algorithms.1373 Interfaces:1374 ``__init__``, ``forward``.1375 """13761377 def __init__(1378 self,1379 input_size: int,1380 output_size: int,1381 hidden_size: int,1382 layer_num: int,1383 ensemble_num: int,1384 activation: Optional[nn.Module] = nn.ReLU(),1385 norm_type: Optional[str] = None1386 ) -> None:1387 super(EnsembleHead, self).__init__()1388 d = input_size1389 layers = []1390 for _ in range(layer_num):1391 layers.append(1392 conv1d_block(1393 d * ensemble_num,1394 hidden_size * ensemble_num,1395 kernel_size=1,1396 stride=1,1397 groups=ensemble_num,1398 activation=activation,1399 norm_type=norm_type1400 )1401 )1402 d = hidden_size14031404 # Adding activation for last layer will lead to train fail1405 layers.append(1406 conv1d_block(1407 hidden_size * ensemble_num,1408 output_size * ensemble_num,1409 kernel_size=1,1410 stride=1,1411 groups=ensemble_num,1412 activation=None,1413 norm_type=None1414 )1415 )1416 self.pred = nn.Sequential(*layers)14171418 def forward(self, x: torch.Tensor) -> Dict:1419 """1420 Overview:1421 Use encoded embedding tensor to run MLP with ``EnsembleHead`` and return the prediction dictionary.1422 Arguments:1423 - x (:obj:`torch.Tensor`): Tensor containing input embedding.1424 Returns:1425 - outputs (:obj:`Dict`): Dict containing keyword ``pred`` (:obj:`torch.Tensor`).1426 Shapes:1427 - x: :math:`(B, N * ensemble_num, 1)`, where ``B = batch_size`` and ``N = hidden_size``.1428 - pred: :math:`(B, M * ensemble_num, 1)`, where ``M = output_size``.1429 Examples:1430 >>> head = EnsembleHead(64 * 10, 64 * 10)1431 >>> inputs = torch.randn(4, 64 * 10, 1) `1432 >>> outputs = head(inputs)1433 >>> assert isinstance(outputs, dict)1434 >>> assert outputs['pred'].shape == torch.Size([10, 64 * 10])1435 """1436 x = self.pred(x).squeeze(-1)1437 return {'pred': x}143814391440def independent_normal_dist(logits: Union[List, Dict]) -> torch.distributions.Distribution:1441 """1442 Overview:1443 Convert different types logit to independent normal distribution.1444 Arguments:1445 - logits (:obj:`Union[List, Dict]`): The logits to be converted.1446 Returns:1447 - dist (:obj:`torch.distributions.Distribution`): The converted normal distribution.1448 Examples:1449 >>> logits = [torch.randn(4, 5), torch.ones(4, 5)]1450 >>> dist = independent_normal_dist(logits)1451 >>> assert isinstance(dist, torch.distributions.Independent)1452 >>> assert isinstance(dist.base_dist, torch.distributions.Normal)1453 >>> assert dist.base_dist.loc.shape == torch.Size([4, 5])1454 >>> assert dist.base_dist.scale.shape == torch.Size([4, 5])1455 Raises:1456 - TypeError: If the type of logits is not ``list`` or ``dict``.1457 """1458 if isinstance(logits, (list, tuple)):1459 return Independent(Normal(*logits), 1)1460 elif isinstance(logits, dict):1461 return Independent(Normal(logits['mu'], logits['sigma']), 1)1462 else:1463 raise TypeError("invalid logits type: {}".format(type(logits)))146414651466head_cls_map = {1467 # discrete1468 'discrete': DiscreteHead,1469 'dueling': DuelingHead,1470 'sdn': StochasticDuelingHead,1471 'distribution': DistributionHead,1472 'rainbow': RainbowHead,1473 'qrdqn': QRDQNHead,1474 'quantile': QuantileHead,1475 'fqf': FQFHead,1476 'branch': BranchingHead,1477 'attention_policy': AttentionPolicyHead,1478 # continuous1479 'regression': RegressionHead,1480 'reparameterization': ReparameterizationHead,1481 'popart': PopArtVHead,1482 'sdn': StochasticDuelingHead,1483 # multi1484 'multi': MultiHead,1485 'ensemble': EnsembleHead,1486}