ding.model¶
ding.model
¶
DiscreteHead
¶
Bases: Module
Overview
The DiscreteHead is used to generate discrete actions logit or Q-value logit, which is often used in q-learning algorithms or actor-critic algorithms for discrete action space.
Interfaces:
__init__, forward.
__init__(hidden_size, output_size, layer_num=1, activation=nn.ReLU(), norm_type=None, dropout=None, noise=False)
¶
Overview
Init the DiscreteHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to DiscreteHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- dropout (:obj:float): The dropout rate, default set to None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with DiscreteHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keyword logit (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
Examples:
>>> head = DiscreteHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 64])
DuelingHead
¶
Bases: Module
Overview
The DuelingHead is used to output discrete actions logit.
This module is used in Dueling DQN.
Interfaces:
__init__, forward.
__init__(hidden_size, output_size, layer_num=1, a_layer_num=None, v_layer_num=None, activation=nn.ReLU(), norm_type=None, dropout=None, noise=False)
¶
Overview
Init the DuelingHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to DuelingHead.
- output_size (:obj:int): The number of outputs.
- a_layer_num (:obj:int): The number of layers used in the network to compute action output.
- v_layer_num (:obj:int): The number of layers used in the network to compute value output.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- dropout (:obj:float): The dropout rate of dropout layer. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with DuelingHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keyword logit (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
Examples:
>>> head = DuelingHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
DistributionHead
¶
Bases: Module
Overview
The DistributionHead is used to generate distribution for Q-value.
This module is used in C51 algorithm.
Interfaces:
__init__, forward.
__init__(hidden_size, output_size, layer_num=1, n_atom=51, v_min=-10, v_max=10, activation=nn.ReLU(), norm_type=None, noise=False, eps=1e-06)
¶
Overview
Init the DistributionHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to DistributionHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value distribution.
- n_atom (:obj:int): The number of atoms (discrete supports). Default is 51.
- v_min (:obj:int): Min value of atoms. Default is -10.
- v_max (:obj:int): Max value of atoms. Default is 10.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
- eps (:obj:float): Small constant used for numerical stability.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with DistributionHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor) and distribution (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
- distribution: :math:(B, M, n_atom).
Examples:
>>> head = DistributionHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom is 51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])
RainbowHead
¶
Bases: Module
Overview
The RainbowHead is used to generate distribution of Q-value.
This module is used in Rainbow DQN.
Interfaces:
__init__, forward.
__init__(hidden_size, output_size, layer_num=1, n_atom=51, v_min=-10, v_max=10, activation=nn.ReLU(), norm_type=None, noise=True, eps=1e-06)
¶
Overview
Init the RainbowHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to RainbowHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- n_atom (:obj:int): The number of atoms (discrete supports). Default is 51.
- v_min (:obj:int): Min value of atoms. Default is -10.
- v_max (:obj:int): Max value of atoms. Default is 10.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
- eps (:obj:float): Small constant used for numerical stability.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with RainbowHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor) and distribution (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
- distribution: :math:(B, M, n_atom).
Examples:
>>> head = RainbowHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom is 51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])
QRDQNHead
¶
Bases: Module
Overview
The QRDQNHead (Quantile Regression DQN) is used to output action quantiles.
Interfaces:
__init__, forward.
__init__(hidden_size, output_size, layer_num=1, num_quantiles=32, activation=nn.ReLU(), norm_type=None, noise=False)
¶
Overview
Init the QRDQNHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to QRDQNHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- num_quantiles (:obj:int): The number of quantiles. Default is 32.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with QRDQNHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor), q (:obj:torch.Tensor), and tau (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
- q: :math:(B, M, num_quantiles).
- tau: :math:(B, M, 1).
Examples:
>>> head = QRDQNHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles is 32
>>> assert outputs['q'].shape == torch.Size([4, 64, 32])
>>> assert outputs['tau'].shape == torch.Size([4, 32, 1])
StochasticDuelingHead
¶
Bases: Module
Overview
The Stochastic Dueling Network is proposed in paper ACER (arxiv 1611.01224). That is to say, dueling network architecture in continuous action space.
Interfaces:
__init__, forward.
__init__(hidden_size, action_shape, layer_num=1, a_layer_num=None, v_layer_num=None, activation=nn.ReLU(), norm_type=None, noise=False, last_tanh=True)
¶
Overview
Init the Stochastic DuelingHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to StochasticDuelingHead.
- action_shape (:obj:int): The number of continuous action shape, usually integer value.
- layer_num (:obj:int): The number of default layers used in the network to compute action and value output.
- a_layer_num (:obj:int): The number of layers used in the network to compute action output. Default is layer_num.
- v_layer_num (:obj:int): The number of layers used in the network to compute value output. Default is layer_num.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
- last_tanh (:obj:bool): If True Apply tanh to actions. Default True.
forward(s, a, mu, sigma, sample_size=10)
¶
Overview
Use encoded embedding tensor to run MLP with StochasticDuelingHead and return the prediction dictionary.
Arguments:
- s (:obj:torch.Tensor): Tensor containing input embedding.
- a (:obj:torch.Tensor): The original continuous behaviour action.
- mu (:obj:torch.Tensor): The mu gaussian reparameterization output of actor head at current timestep.
- sigma (:obj:torch.Tensor): The sigma gaussian reparameterization output of actor head at current timestep.
- sample_size (:obj:int): The number of samples for continuous action when computing the Q value.
Returns:
- outputs (:obj:Dict): Dict containing keywords q_value (:obj:torch.Tensor) and v_value (:obj:torch.Tensor).
Shapes:
- s: :math:(B, N), where B = batch_size and N = hidden_size.
- a: :math:(B, A), where A = action_size.
- mu: :math:(B, A).
- sigma: :math:(B, A).
- q_value: :math:(B, 1).
- v_value: :math:(B, 1).
Examples:
>>> head = StochasticDuelingHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> a = torch.randn(4, 64)
>>> mu = torch.randn(4, 64)
>>> sigma = torch.ones(4, 64)
>>> outputs = head(inputs, a, mu, sigma)
>>> assert isinstance(outputs, dict)
>>> assert outputs['q_value'].shape == torch.Size([4, 1])
>>> assert outputs['v_value'].shape == torch.Size([4, 1])
QuantileHead
¶
Bases: Module
Overview
The QuantileHead is used to output action quantiles.
This module is used in IQN.
Interfaces:
__init__, forward, quantile_net.
.. note::
The difference between QuantileHead and QRDQNHead is that QuantileHead models the state-action quantile function as a mapping from state-actions and samples from some base distribution while QRDQNHead approximates random returns by a uniform mixture of Diracs functions.
__init__(hidden_size, output_size, layer_num=1, num_quantiles=32, quantile_embedding_size=128, beta_function_type='uniform', activation=nn.ReLU(), norm_type=None, noise=False)
¶
Overview
Init the QuantileHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to QuantileHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- num_quantiles (:obj:int): The number of quantiles.
- quantile_embedding_size (:obj:int): The embedding size of a quantile.
- beta_function_type (:obj:str): Type of beta function. See ding.rl_utils.beta_function.py for more details. Default is uniform.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
quantile_net(quantiles)
¶
Overview
Deterministic parametric function trained to reparameterize samples from a base distribution. By repeated Bellman update iterations of Q-learning, the optimal action-value function is estimated.
Arguments:
- x (:obj:torch.Tensor): The encoded embedding tensor of parametric sample.
Returns:
- quantile_net (:obj:torch.Tensor): Quantile network output tensor after reparameterization.
Shapes:
- quantile_net :math:(quantile_embedding_size, M), where M = output_size.
Examples:
>>> head = QuantileHead(64, 64)
>>> quantiles = torch.randn(128,1)
>>> qn_output = head.quantile_net(quantiles)
>>> assert isinstance(qn_output, torch.Tensor)
>>> # default quantile_embedding_size: int = 128,
>>> assert qn_output.shape == torch.Size([128, 64])
forward(x, num_quantiles=None)
¶
Overview
Use encoded embedding tensor to run MLP with QuantileHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor), q (:obj:torch.Tensor), and quantiles (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
- q: :math:(num_quantiles, B, M).
- quantiles: :math:(quantile_embedding_size, 1).
Examples:
>>> head = QuantileHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles is 32
>>> assert outputs['q'].shape == torch.Size([32, 4, 64])
>>> assert outputs['quantiles'].shape == torch.Size([128, 1])
FQFHead
¶
Bases: Module
Overview
The FQFHead is used to output action quantiles.
This module is used in FQF.
Interfaces:
__init__, forward, quantile_net.
.. note:: The implementation of FQFHead is based on the paper https://arxiv.org/abs/1911.02140. The difference between FQFHead and QuantileHead is that, in FQF, N adjustable quantile values for N adjustable quantile fractions are estimated to approximate the quantile function. The distribution of the return is approximated by a weighted mixture of N Diracs functions. While in IQN, the state-action quantile function is modeled as a mapping from state-actions and samples from some base distribution.
__init__(hidden_size, output_size, layer_num=1, num_quantiles=32, quantile_embedding_size=128, activation=nn.ReLU(), norm_type=None, noise=False)
¶
Overview
Init the FQFHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to FQFHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- num_quantiles (:obj:int): The number of quantiles.
- quantile_embedding_size (:obj:int): The embedding size of a quantile.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
quantile_net(quantiles)
¶
Overview
Deterministic parametric function trained to reparameterize samples from the quantiles_proposal network. By repeated Bellman update iterations of Q-learning, the optimal action-value function is estimated.
Arguments:
- x (:obj:torch.Tensor): The encoded embedding tensor of parametric sample.
Returns:
- quantile_net (:obj:torch.Tensor): Quantile network output tensor after reparameterization.
Examples:
>>> head = FQFHead(64, 64)
>>> quantiles = torch.randn(4,32)
>>> qn_output = head.quantile_net(quantiles)
>>> assert isinstance(qn_output, torch.Tensor)
>>> # default quantile_embedding_size: int = 128,
>>> assert qn_output.shape == torch.Size([4, 32, 64])
forward(x, num_quantiles=None)
¶
Overview
Use encoded embedding tensor to run MLP with FQFHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor), q (:obj:torch.Tensor), quantiles (:obj:torch.Tensor), quantiles_hats (:obj:torch.Tensor), q_tau_i (:obj:torch.Tensor), entropies (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
- q: :math:(B, num_quantiles, M).
- quantiles: :math:(B, num_quantiles + 1).
- quantiles_hats: :math:(B, num_quantiles).
- q_tau_i: :math:(B, num_quantiles - 1, M).
- entropies: :math:(B, 1).
Examples:
>>> head = FQFHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles is 32
>>> assert outputs['q'].shape == torch.Size([4, 32, 64])
>>> assert outputs['quantiles'].shape == torch.Size([4, 33])
>>> assert outputs['quantiles_hats'].shape == torch.Size([4, 32])
>>> assert outputs['q_tau_i'].shape == torch.Size([4, 31, 64])
>>> assert outputs['quantiles'].shape == torch.Size([4, 1])
RegressionHead
¶
Bases: Module
Overview
The RegressionHead is used to regress continuous variables.
This module is used for generating Q-value (DDPG critic) of continuous actions, or state value (A2C/PPO), or directly predicting continuous action (DDPG actor).
Interfaces:
__init__, forward.
__init__(input_size, output_size, layer_num=2, final_tanh=False, activation=nn.ReLU(), norm_type=None, hidden_size=None)
¶
Overview
Init the RegressionHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to RegressionHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- final_tanh (:obj:bool): If True apply tanh to output. Default False.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with RegressionHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keyword pred (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- pred: :math:(B, M), where M = output_size.
Examples:
>>> head = RegressionHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['pred'].shape == torch.Size([4, 64])
ReparameterizationHead
¶
Bases: Module
Overview
The ReparameterizationHead is used to generate Gaussian distribution of continuous variable, which is parameterized by mu and sigma.
This module is often used in stochastic policies, such as PPO and SAC.
Interfaces:
__init__, forward.
__init__(input_size, output_size, layer_num=2, sigma_type=None, fixed_sigma_value=1.0, activation=nn.ReLU(), norm_type=None, bound_type=None, hidden_size=None)
¶
Overview
Init the ReparameterizationHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to ReparameterizationHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- sigma_type (:obj:str): Sigma type used. Choose among ['fixed', 'independent', 'conditioned']. Default is None.
- fixed_sigma_value (:obj:float): When choosing fixed type, the tensor output['sigma'] is filled with this input value. Default is None.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- bound_type (:obj:str): Bound type to apply to output mu. Choose among ['tanh', None]. Default is None.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with ReparameterizationHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords mu (:obj:torch.Tensor) and sigma (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- mu: :math:(B, M), where M = output_size.
- sigma: :math:(B, M).
Examples:
>>> head = ReparameterizationHead(64, 64, sigma_type='fixed')
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['mu'].shape == torch.Size([4, 64])
>>> assert outputs['sigma'].shape == torch.Size([4, 64])
MultiHead
¶
Bases: Module
Overview
The MultiHead is used to generate multiple similar results.
For example, we can combine Distribution and MultiHead to generate multi-discrete action space logit.
Interfaces:
__init__, forward.
__init__(head_cls, hidden_size, output_size_list, **head_kwargs)
¶
Overview
Init the MultiHead layers according to the provided arguments.
Arguments:
- head_cls (:obj:type): The class of head, choose among [DuelingHead, DistributionHead, ''QuatileHead'', ...].
- hidden_size (:obj:int): The hidden_size of the MLP connected to the Head.
- output_size_list (:obj:int): Sequence of output_size for multi discrete action, e.g. [2, 3, 5].
- head_kwargs: (:obj:dict): Dict containing class-specific arguments.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with MultiHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor) corresponding to the logit of each output each accessed at ['logit'][i].
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, Mi), where Mi = output_size corresponding to output i.
Examples:
>>> head = MultiHead(DuelingHead, 64, [2, 3, 5], v_layer_num=2)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> # output_size_list is [2, 3, 5] as set
>>> # Therefore each dim of logit is as follows
>>> outputs['logit'][0].shape
>>> torch.Size([4, 2])
>>> outputs['logit'][1].shape
>>> torch.Size([4, 3])
>>> outputs['logit'][2].shape
>>> torch.Size([4, 5])
BranchingHead
¶
Bases: Module
Overview
The BranchingHead is used to generate Q-value with different branches.
This module is used in Branch DQN.
Interfaces:
__init__, forward.
__init__(hidden_size, num_branches=0, action_bins_per_branch=2, layer_num=1, a_layer_num=None, v_layer_num=None, norm_type=None, activation=nn.ReLU(), noise=False)
¶
Overview
Init the BranchingHead layers according to the provided arguments. This head achieves a linear increase of the number of network outputs with the number of degrees of freedom by allowing a level of independence for each individual action.
Therefore, this head is suitable for high dimensional action Spaces.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to BranchingHead.
- num_branches (:obj:int): The number of branches, which is equivalent to the action dimension.
- action_bins_per_branch (:obj:int): The number of action bins in each dimension.
- layer_num (:obj:int): The number of layers used in the network to compute Advantage and Value output.
- a_layer_num (:obj:int): The number of layers used in the network to compute Advantage output.
- v_layer_num (:obj:int): The number of layers used in the network to compute Value output.
- output_size (:obj:int): The number of outputs.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- noise (:obj:bool): Whether use NoiseLinearLayer as layer_fn in Q networks' MLP. Default False.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with BranchingHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keyword logit (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
Examples:
>>> head = BranchingHead(64, 5, 2)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 5, 2])
AttentionPolicyHead
¶
Bases: Module
Overview
Cross-attention-type discrete action policy head, which is often used in variable discrete action space.
Interfaces:
__init__, forward.
forward(key, query)
¶
Overview
Use attention-like mechanism to combine key and query tensor to output discrete action logit.
Arguments:
- key (:obj:torch.Tensor): Tensor containing key embedding.
- query (:obj:torch.Tensor): Tensor containing query embedding.
Returns:
- logit (:obj:torch.Tensor): Tensor containing output discrete action logit.
Shapes:
- key: :math:(B, N, K), where B = batch_size, N = possible discrete action choices and K = hidden_size.
- query: :math:(B, K).
- logit: :math:(B, N).
Examples:
>>> head = AttentionPolicyHead()
>>> key = torch.randn(4, 5, 64)
>>> query = torch.randn(4, 64)
>>> logit = head(key, query)
>>> assert logit.shape == torch.Size([4, 5])
.. note::
In this head, we assume that the key and query tensor are both normalized.
PopArtVHead
¶
Bases: Module
Overview
The PopArtVHead is used to generate adaptive normalized state value. More information can be found in paper Multi-task Deep Reinforcement Learning with PopArt. https://arxiv.org/abs/1809.04474 This module is used in PPO or IMPALA.
Interfaces:
__init__, forward.
__init__(hidden_size, output_size, layer_num=1, activation=nn.ReLU(), norm_type=None)
¶
Overview
Init the PopArtVHead layers according to the provided arguments.
Arguments:
- hidden_size (:obj:int): The hidden_size of the MLP connected to PopArtVHead.
- output_size (:obj:int): The number of outputs.
- layer_num (:obj:int): The number of layers used in the network to compute Q value output.
- activation (:obj:nn.Module): The type of activation function to use in MLP. If None, then default set activation to nn.ReLU(). Default None.
- norm_type (:obj:str): The type of normalization to use. See ding.torch_utils.network.fc_block for more details. Default None.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with PopArtVHead and return the normalized prediction and the unnormalized prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keyword pred (:obj:torch.Tensor) and unnormalized_pred (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B = batch_size and N = hidden_size.
- logit: :math:(B, M), where M = output_size.
Examples:
>>> head = PopArtVHead(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict) and outputs['pred'].shape == torch.Size([4, 64]) and outputs['unnormalized_pred'].shape == torch.Size([4, 64])
EnsembleHead
¶
Bases: Module
Overview
The EnsembleHead is used to generate Q-value for Q-ensemble in model-based RL algorithms.
Interfaces:
__init__, forward.
forward(x)
¶
Overview
Use encoded embedding tensor to run MLP with EnsembleHead and return the prediction dictionary.
Arguments:
- x (:obj:torch.Tensor): Tensor containing input embedding.
Returns:
- outputs (:obj:Dict): Dict containing keyword pred (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N * ensemble_num, 1), where B = batch_size and N = hidden_size.
- pred: :math:(B, M * ensemble_num, 1), where M = output_size.
Examples:
>>> head = EnsembleHead(64 * 10, 64 * 10)
>>> inputs = torch.randn(4, 64 * 10, 1) `
>>> outputs = head(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['pred'].shape == torch.Size([10, 64 * 10])
ConvEncoder
¶
Bases: Module
Overview
The Convolution Encoder is used to encode 2-dim image observations.
Interfaces:
__init__, forward.
__init__(obs_shape, hidden_size_list=[32, 64, 64, 128], activation=nn.ReLU(), kernel_size=[8, 4, 3], stride=[4, 2, 1], padding=None, layer_norm=False, norm_type=None)
¶
Overview
Initialize the Convolution Encoder according to the provided arguments.
Arguments:
- obs_shape (:obj:SequenceType): Sequence of in_channel, plus one or more input size.
- hidden_size_list (:obj:SequenceType): Sequence of hidden_size of subsequent conv layers and the final dense layer.
- activation (:obj:nn.Module): Type of activation to use in the conv layers and ResBlock. Default is nn.ReLU().
- kernel_size (:obj:SequenceType): Sequence of kernel_size of subsequent conv layers.
- stride (:obj:SequenceType): Sequence of stride of subsequent conv layers.
- padding (:obj:SequenceType): Padding added to all four sides of the input for each conv layer. See nn.Conv2d for more details. Default is None.
- layer_norm (:obj:bool): Whether to use DreamerLayerNorm, which is kind of special trick proposed in DreamerV3.
- norm_type (:obj:str): Type of normalization to use. See ding.torch_utils.network.ResBlock for more details. Default is None.
forward(x)
¶
Overview
Return output 1D embedding tensor of the env's 2D image observation.
Arguments:
- x (:obj:torch.Tensor): Raw 2D observation of the environment.
Returns:
- outputs (:obj:torch.Tensor): Output embedding tensor.
Shapes:
- x : :math:(B, C, H, W), where B is batch size, C is channel, H is height, W is width.
- outputs: :math:(B, N), where N = hidden_size_list[-1] .
Examples:
>>> conv = ConvEncoder(
>>> obs_shape=(4, 84, 84),
>>> hidden_size_list=[32, 64, 64, 128],
>>> activation=nn.ReLU(),
>>> kernel_size=[8, 4, 3],
>>> stride=[4, 2, 1],
>>> padding=None,
>>> layer_norm=False,
>>> norm_type=None
>>> )
>>> x = torch.randn(1, 4, 84, 84)
>>> output = conv(x)
FCEncoder
¶
Bases: Module
Overview
The full connected encoder is used to encode 1-dim input variable.
Interfaces:
__init__, forward.
__init__(obs_shape, hidden_size_list, res_block=False, activation=nn.ReLU(), norm_type=None, dropout=None)
¶
Overview
Initialize the FC Encoder according to arguments.
Arguments:
- obs_shape (:obj:int): Observation shape.
- hidden_size_list (:obj:SequenceType): Sequence of hidden_size of subsequent FC layers.
- res_block (:obj:bool): Whether use res_block. Default is False.
- activation (:obj:nn.Module): Type of activation to use in ResFCBlock. Default is nn.ReLU().
- norm_type (:obj:str): Type of normalization to use. See ding.torch_utils.network.ResFCBlock for more details. Default is None.
- dropout (:obj:float): Dropout rate of the dropout layer. If None then default no dropout layer.
forward(x)
¶
Overview
Return output embedding tensor of the env observation.
Arguments:
- x (:obj:torch.Tensor): Env raw observation.
Returns:
- outputs (:obj:torch.Tensor): Output embedding tensor.
Shapes:
- x : :math:(B, M), where M = obs_shape.
- outputs: :math:(B, N), where N = hidden_size_list[-1].
Examples:
>>> fc = FCEncoder(
>>> obs_shape=4,
>>> hidden_size_list=[32, 64, 64, 128],
>>> activation=nn.ReLU(),
>>> norm_type=None,
>>> dropout=None
>>> )
>>> x = torch.randn(1, 4)
>>> output = fc(x)
IMPALAConvEncoder
¶
Bases: Module
Overview
IMPALA CNN encoder, which is used in IMPALA algorithm. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures, https://arxiv.org/pdf/1802.01561.pdf,
Interface:
__init__, forward, output_shape.
__init__(obs_shape, channels=(16, 32, 32), outsize=256, scale_ob=255.0, nblock=2, final_relu=True, **kwargs)
¶
Overview
Initialize the IMPALA CNN encoder according to arguments.
Arguments:
- obs_shape (:obj:SequenceType): 2D image observation shape.
- channels (:obj:SequenceType): The channel number of a series of impala cnn blocks. Each element of the sequence is the output channel number of a impala cnn block.
- outsize (:obj:int): The output size the final linear layer, which means the dimension of the 1D embedding vector.
- scale_ob (:obj:float): The scale of the input observation, which is used to normalize the input observation, such as dividing 255.0 for the raw image observation.
- nblock (:obj:int): The number of Residual Block in each block.
- final_relu (:obj:bool): Whether to use ReLU activation in the final output of encoder.
- kwargs (:obj:Dict[str, Any]): Other arguments for IMPALACnnDownStack.
forward(x)
¶
Overview
Return the 1D embedding vector of the input 2D observation.
Arguments:
- x (:obj:torch.Tensor): Input 2D observation tensor.
Returns:
- output (:obj:torch.Tensor): Output 1D embedding vector.
Shapes:
- x (:obj:torch.Tensor): :math:(B, C, H, W), where B is batch size, C is channel number, H is height and W is width.
- output (:obj:torch.Tensor): :math:(B, outsize), where B is batch size.
Examples:
>>> encoder = IMPALAConvEncoder(
>>> obs_shape=(4, 84, 84),
>>> channels=(16, 32, 32),
>>> outsize=256,
>>> scale_ob=255.0,
>>> nblock=2,
>>> final_relu=True,
>>> )
>>> x = torch.randn(1, 4, 84, 84)
>>> output = encoder(x)
GaussianFourierProjectionTimeEncoder
¶
Bases: Module
Overview
Gaussian random features for encoding time steps. This module is used as the encoder of time in generative models such as diffusion model.
Interfaces:
__init__, forward.
__init__(embed_dim, scale=30.0)
¶
Overview
Initialize the Gaussian Fourier Projection Time Encoder according to arguments.
Arguments:
- embed_dim (:obj:int): The dimension of the output embedding vector.
- scale (:obj:float): The scale of the Gaussian random features.
forward(x)
¶
Overview
Return the output embedding vector of the input time step.
Arguments:
- x (:obj:torch.Tensor): Input time step tensor.
Returns:
- output (:obj:torch.Tensor): Output embedding vector.
Shapes:
- x (:obj:torch.Tensor): :math:(B,), where B is batch size.
- output (:obj:torch.Tensor): :math:(B, embed_dim), where B is batch size, embed_dim is the dimension of the output embedding vector.
Examples:
>>> encoder = GaussianFourierProjectionTimeEncoder(128)
>>> x = torch.randn(100)
>>> output = encoder(x)
DQN
¶
Bases: Module
Overview
The neural nework structure and computation graph of Deep Q Network (DQN) algorithm, which is the most classic value-based RL algorithm for discrete action. The DQN is composed of two parts: encoder and head. The encoder is used to extract the feature from various observation, and the head is used to compute the Q value of each action dimension.
Interfaces:
__init__, forward.
.. note::
Current DQN supports two types of encoder: FCEncoder and ConvEncoder, two types of head: DiscreteHead and DuelingHead. You can customize your own encoder or head by inheriting this class.
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], dueling=True, head_hidden_size=None, head_layer_num=1, activation=nn.ReLU(), norm_type=None, dropout=None, init_bias=None, noise=False)
¶
Overview
initialize the DQN (encoder + head) Model according to corresponding input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].
- action_shape (:obj:Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3].
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size.
- dueling (:obj:Optional[bool]): Whether choose DuelingHead or DiscreteHead (default).
- head_hidden_size (:obj:Optional[int]): The hidden_size of head network, defaults to None, then it will be set to the last element of encoder_hidden_size_list.
- head_layer_num (:obj:int): The number of layers used in the head network to compute Q value output.
- activation (:obj:Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details. you can choose one of ['BN', 'IN', 'SyncBN', 'LN']
- dropout (:obj:Optional[float]): The dropout rate of the dropout layer. if None then default disable dropout layer.
- init_bias (:obj:Optional[float]): The initial value of the last layer bias in the head network. - noise (:obj:bool): Whether to use NoiseLinearLayer as layer_fn to boost exploration in Q networks' MLP. Default to False.
forward(x)
¶
Overview
DQN forward computation graph, input observation tensor to predict q_value.
Arguments:
- x (:obj:torch.Tensor): The input observation tensor data.
Returns:
- outputs (:obj:Dict): The output of DQN's forward, including q_value.
ReturnsKeys:
- logit (:obj:torch.Tensor): Discrete Q-value output of each possible action dimension.
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is obs_shape
- logit (:obj:torch.Tensor): :math:(B, M), where B is batch size and M is action_shape
Examples:
>>> model = DQN(32, 6) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 32)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 6])
.. note::
For consistency and compatibility, we name all the outputs of the network which are related to action selections as logit.
RainbowDQN
¶
Bases: Module
Overview
The neural network structure and computation graph of RainbowDQN, which combines distributional RL and DQN. You can refer to paper Rainbow: Combining Improvements in Deep Reinforcement Learning https://arxiv.org/pdf/1710.02298.pdf for more details.
Interfaces:
__init__, forward
.. note:: RainbowDQN contains dueling architecture by default.
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, head_layer_num=1, activation=nn.ReLU(), norm_type=None, v_min=-10, v_max=10, n_atom=51)
¶
Overview
Init the Rainbow Model according to arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape.
- action_shape (:obj:Union[int, SequenceType]): Action space shape.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder
- head_hidden_size (:obj:Optional[int]): The hidden_size to pass to Head.
- head_layer_num (:obj:int): The num of layers used in the network to compute Q value output
- activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]): The type of normalization to use, see ding.torch_utils.fc_block for more details- n_atom (:obj:Optional[int]`): Number of atoms in the prediction distribution.
forward(x)
¶
Overview
Use observation tensor to predict Rainbow output. Parameter updates with Rainbow's MLPs forward setup.
Arguments:
- x (:obj:torch.Tensor):
The encoded embedding tensor with (B, N=hidden_size).
Returns:
- outputs (:obj:Dict):
Run MLP with RainbowHead setups and return the result prediction dictionary.
ReturnsKeys:
- logit (:obj:torch.Tensor): Logit tensor with same size as input x.
- distribution (:obj:torch.Tensor): Distribution tensor of size (B, N, n_atom)
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is head_hidden_size.
- logit (:obj:torch.FloatTensor): :math:(B, M), where M is action_shape.
- distribution(:obj:torch.FloatTensor): :math:(B, M, P), where P is n_atom.
Examples:
>>> model = RainbowDQN(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom: int =51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])
QRDQN
¶
Bases: Module
Overview
The neural network structure and computation graph of QRDQN, which combines distributional RL and DQN. You can refer to Distributional Reinforcement Learning with Quantile Regression https://arxiv.org/pdf/1710.10044.pdf for more details.
Interfaces:
__init__, forward
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, head_layer_num=1, num_quantiles=32, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the QRDQN Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's space.
- action_shape (:obj:Union[int, SequenceType]): Action's space.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder
- head_hidden_size (:obj:Optional[int]): The hidden_size to pass to Head.
- head_layer_num (:obj:int): The num of layers used in the network to compute Q value output
- num_quantiles (:obj:int): Number of quantiles in the prediction distribution.
- activation (:obj:Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn,
if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details`
forward(x)
¶
Overview
Use observation tensor to predict QRDQN's output. Parameter updates with QRDQN's MLPs forward setup.
Arguments:
- x (:obj:torch.Tensor):
The encoded embedding tensor with (B, N=hidden_size).
Returns:
- outputs (:obj:Dict):
Run with encoder and head. Return the result prediction dictionary.
ReturnsKeys:
- logit (:obj:torch.Tensor): Logit tensor with same size as input x.
- q (:obj:torch.Tensor): Q valye tensor tensor of size (B, N, num_quantiles)
- tau (:obj:torch.Tensor): tau tensor of size (B, N, 1)
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is head_hidden_size.
- logit (:obj:torch.FloatTensor): :math:(B, M), where M is action_shape.
- tau (:obj:torch.Tensor): :math:(B, M, 1)
Examples:
>>> model = QRDQN(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles : int = 32
>>> assert outputs['q'].shape == torch.Size([4, 64, 32])
>>> assert outputs['tau'].shape == torch.Size([4, 32, 1])
IQN
¶
Bases: Module
Overview
The neural network structure and computation graph of IQN, which combines distributional RL and DQN. You can refer to paper Implicit Quantile Networks for Distributional Reinforcement Learning https://arxiv.org/pdf/1806.06923.pdf for more details.
Interfaces:
__init__, forward
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, head_layer_num=1, num_quantiles=32, quantile_embedding_size=128, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the IQN Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape.
- action_shape (:obj:Union[int, SequenceType]): Action space shape.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder
- head_hidden_size (:obj:Optional[int]): The hidden_size to pass to Head.
- head_layer_num (:obj:int): The num of layers used in the network to compute Q value output
- num_quantiles (:obj:int): Number of quantiles in the prediction distribution.
- activation (:obj:Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn,
if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details.
forward(x)
¶
Overview
Use encoded embedding tensor to predict IQN's output. Parameter updates with IQN's MLPs forward setup.
Arguments:
- x (:obj:torch.Tensor):
The encoded embedding tensor with (B, N=hidden_size).
Returns:
- outputs (:obj:Dict):
Run with encoder and head. Return the result prediction dictionary.
ReturnsKeys:
- logit (:obj:torch.Tensor): Logit tensor with same size as input x.
- q (:obj:torch.Tensor): Q valye tensor tensor of size (num_quantiles, N, B)
- quantiles (:obj:torch.Tensor): quantiles tensor of size (quantile_embedding_size, 1)
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is head_hidden_size.
- logit (:obj:torch.FloatTensor): :math:(B, M), where M is action_shape
- quantiles (:obj:torch.Tensor): :math:(P, 1), where P is quantile_embedding_size.
Examples:
>>> model = IQN(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles: int = 32
>>> assert outputs['q'].shape == torch.Size([32, 4, 64]
>>> # default quantile_embedding_size: int = 128
>>> assert outputs['quantiles'].shape == torch.Size([128, 1])
FQF
¶
Bases: Module
Overview
The neural network structure and computation graph of FQF, which combines distributional RL and DQN. You can refer to paper Fully Parameterized Quantile Function for Distributional Reinforcement Learning https://arxiv.org/pdf/1911.02140.pdf for more details.
Interface:
__init__, forward
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, head_layer_num=1, num_quantiles=32, quantile_embedding_size=128, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the FQF Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape.
- action_shape (:obj:Union[int, SequenceType]): Action space shape.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder
- head_hidden_size (:obj:Optional[int]): The hidden_size to pass to Head.
- head_layer_num (:obj:int): The num of layers used in the network to compute Q value output
- num_quantiles (:obj:int): Number of quantiles in the prediction distribution.
- activation (:obj:Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn,
if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details.
forward(x)
¶
Overview
Use encoded embedding tensor to predict FQF's output. Parameter updates with FQF's MLPs forward setup.
Arguments:
- x (:obj:torch.Tensor):
The encoded embedding tensor with (B, N=hidden_size).
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor), q (:obj:torch.Tensor), quantiles (:obj:torch.Tensor), quantiles_hats (:obj:torch.Tensor), q_tau_i (:obj:torch.Tensor), entropies (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B is batch size and N is head_hidden_size.
- logit: :math:(B, M), where M is action_shape.
- q: :math:(B, num_quantiles, M).
- quantiles: :math:(B, num_quantiles + 1).
- quantiles_hats: :math:(B, num_quantiles).
- q_tau_i: :math:(B, num_quantiles - 1, M).
- entropies: :math:(B, 1).
Examples:
>>> model = FQF(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles: int = 32
>>> assert outputs['q'].shape == torch.Size([4, 32, 64])
>>> assert outputs['quantiles'].shape == torch.Size([4, 33])
>>> assert outputs['quantiles_hats'].shape == torch.Size([4, 32])
>>> assert outputs['q_tau_i'].shape == torch.Size([4, 31, 64])
>>> assert outputs['quantiles'].shape == torch.Size([4, 1])
DRQN
¶
Bases: Module
Overview
The DRQN (Deep Recurrent Q-Network) is a neural network model combining DQN with RNN to handle sequential
data and partially observable environments. It consists of three main components: encoder, rnn,
and head.
- Encoder: Extracts features from various observation inputs.
- RNN: Processes sequential observations and other data.
- Head: Computes Q-values for each action dimension.
Interfaces
__init__, forward.
.. note::
The current implementation supports:
- Two encoder types: FCEncoder and ConvEncoder.
- Two head types: DiscreteHead and DuelingHead.
- Three RNN types: normal (LSTM with LayerNorm), pytorch (PyTorch's native LSTM), and gru.
You can extend the model by customizing your own encoder, RNN, or head by inheriting this class.
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], dueling=True, head_hidden_size=None, head_layer_num=1, lstm_type='normal', activation=nn.ReLU(), norm_type=None, res_link=False)
¶
Overview
Initialize the DRQN model with specified parameters.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Shape of the observation space, e.g., 8 or [4, 84, 84].
- action_shape (:obj:Union[int, SequenceType]): Shape of the action space, e.g., 6 or [2, 3, 3].
- encoder_hidden_size_list (:obj:SequenceType): List of hidden sizes for the encoder. The last element must match head_hidden_size.
- dueling (:obj:Optional[bool]): Use DuelingHead if True, otherwise use DiscreteHead.
- head_hidden_size (:obj:Optional[int]): Hidden size for the head network. Defaults to the last element of encoder_hidden_size_list if None.
- head_layer_num (:obj:int): Number of layers in the head network to compute Q-value outputs.
- lstm_type (:obj:Optional[str]): Type of RNN module. Supported types are normal, pytorch, and gru.
- activation (:obj:Optional[nn.Module]): Activation function used in the network. Defaults to nn.ReLU().
- norm_type (:obj:Optional[str]): Normalization type for the networks. Supported types are: ['BN', 'IN', 'SyncBN', 'LN']. See ding.torch_utils.fc_block for more details.
- res_link (:obj:bool): Enables residual connections between single-frame data and sequential data. Defaults to False.
forward(inputs, inference=False, saved_state_timesteps=None)
¶
Overview
Defines the forward pass of the DRQN model. Takes observation and previous RNN states as inputs and predicts Q-values.
Arguments:
- inputs (:obj:Dict): Input data dictionary containing observation and previous RNN state.
- inference (:obj:bool): If True, unrolls one timestep (used during evaluation). If False, unrolls the entire sequence (used during training).
- saved_state_timesteps (:obj:Optional[list]): When inference is False, specifies the timesteps whose hidden states are saved and returned.
ArgumentsKeys:
- obs (:obj:torch.Tensor): Raw observation tensor.
- prev_state (:obj:list): Previous RNN state tensor, structure depends on lstm_type.
Returns:
- outputs (:obj:Dict): The output of DRQN's forward, including logit (q_value) and next state.
ReturnsKeys:
- logit (:obj:torch.Tensor): Discrete Q-value output for each action dimension.
- next_state (:obj:list): Next RNN state tensor.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, N) where B is batch size and N is obs_shape.
- logit (:obj:torch.Tensor): :math:(B, M) where B is batch size and M is action_shape.
Examples:
>>> # Initialize input keys
>>> prev_state = [[torch.randn(1, 1, 64) for __ in range(2)] for _ in range(4)] # B=4
>>> obs = torch.randn(4,64)
>>> model = DRQN(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> outputs = model({'obs': inputs, 'prev_state': prev_state}, inference=True)
>>> # Validate output keys and shapes
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == (4, 64)
>>> assert len(outputs['next_state']) == 4
>>> assert all([len(t) == 2 for t in outputs['next_state']])
>>> assert all([t[0].shape == (1, 1, 64) for t in outputs['next_state']])
C51DQN
¶
Bases: Module
Overview
The neural network structure and computation graph of C51DQN, which combines distributional RL and DQN. You can refer to https://arxiv.org/pdf/1707.06887.pdf for more details. The C51DQN is composed of encoder and head. encoder is used to extract the feature of observation, and head is used to compute the distribution of Q-value.
Interfaces:
__init__, forward
.. note::
Current C51DQN supports two types of encoder: FCEncoder and ConvEncoder.
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, head_layer_num=1, activation=nn.ReLU(), norm_type=None, v_min=-10, v_max=10, n_atom=51)
¶
Overview
initialize the C51 Model according to corresponding input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].
- action_shape (:obj:Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3].
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size.
- head_hidden_size (:obj:Optional[int]): The hidden_size of head network, defaults to None, then it will be set to the last element of encoder_hidden_size_list.
- head_layer_num (:obj:int): The number of layers used in the head network to compute Q value output.
- activation (:obj:Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details. you can choose one of ['BN', 'IN', 'SyncBN', 'LN']
- v_min (:obj:Optional[float]): The minimum value of the support of the distribution, which is related to the value (discounted sum of reward) scale of the specific environment. Defaults to -10.
- v_max (:obj:Optional[float]): The maximum value of the support of the distribution, which is related to the value (discounted sum of reward) scale of the specific environment. Defaults to 10.
- n_atom (:obj:Optional[int]): The number of atoms in the prediction distribution, 51 is the default value in the paper, you can also try other values such as 301.
forward(x)
¶
Overview
C51DQN forward computation graph, input observation tensor to predict q_value and its distribution.
Arguments:
- x (:obj:torch.Tensor): The input observation tensor data.
Returns:
- outputs (:obj:Dict): The output of DQN's forward, including q_value, and distribution.
ReturnsKeys:
- logit (:obj:torch.Tensor): Discrete Q-value output of each possible action dimension.
- distribution (:obj:torch.Tensor): Q-Value discretized distribution, i.e., probability of each uniformly spaced atom Q-value, such as dividing [-10, 10] into 51 uniform spaces.
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is head_hidden_size.
- logit (:obj:torch.Tensor): :math:(B, M), where M is action_shape.
- distribution(:obj:torch.Tensor): :math:(B, M, P), where P is n_atom.
Examples:
>>> model = C51DQN(128, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 128)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> # default head_hidden_size: int = 64,
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom: int = 51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])
.. note::
For consistency and compatibility, we name all the outputs of the network which are related to action selections as logit.
.. note:: For convenience, we recommend that the number of atoms should be odd, so that the middle atom is exactly the value of the Q-value.
BDQ
¶
Bases: Module
__init__(obs_shape, num_branches=0, action_bins_per_branch=2, layer_num=3, a_layer_num=None, v_layer_num=None, encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, norm_type=None, activation=nn.ReLU())
¶
Overview
Init the BDQ (encoder + head) Model according to input arguments. referenced paper Action Branching Architectures for Deep Reinforcement Learning https://arxiv.org/pdf/1711.08946
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].
- num_branches (:obj:int): The number of branches, which is equivalent to the action dimension, such as 6 in mujoco's halfcheetah environment.
- action_bins_per_branch (:obj:int): The number of actions in each dimension.
- layer_num (:obj:int): The number of layers used in the network to compute Advantage and Value output.
- a_layer_num (:obj:int): The number of layers used in the network to compute Advantage output.
- v_layer_num (:obj:int): The number of layers used in the network to compute Value output.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size.
- head_hidden_size (:obj:Optional[int]): The hidden_size of head network.
- norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details.
- activation (:obj:Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU()
forward(x)
¶
Overview
BDQ forward computation graph, input observation tensor to predict q_value.
Arguments:
- x (:obj:torch.Tensor): Observation inputs
Returns:
- outputs (:obj:Dict): BDQ forward outputs, such as q_value.
ReturnsKeys:
- logit (:obj:torch.Tensor): Discrete Q-value output of each action dimension.
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is obs_shape
- logit (:obj:torch.FloatTensor): :math:(B, M), where B is batch size and M is
num_branches * action_bins_per_branch
Examples:
>>> model = BDQ(8, 5, 2) # arguments: 'obs_shape', 'num_branches' and 'action_bins_per_branch'.
>>> inputs = torch.randn(4, 8)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 5, 2])
GTrXLDQN
¶
Bases: Module
Overview
The neural network structure and computation graph of Gated Transformer-XL DQN algorithm, which is the enhanced version of DRQN, using Transformer-XL to improve long-term sequential modelling ability. The GTrXL-DQN is composed of three parts: encoder, head and core. The encoder is used to extract the feature from various observation, the core is used to process the sequential observation and other data, and the head is used to compute the Q value of each action dimension.
Interfaces:
__init__, forward, reset_memory, get_memory .
__init__(obs_shape, action_shape, head_layer_num=1, att_head_dim=16, hidden_size=16, att_head_num=2, att_mlp_num=2, att_layer_num=3, memory_len=64, activation=nn.ReLU(), head_norm_type=None, dropout=0.0, gru_gating=True, gru_bias=2.0, dueling=True, encoder_hidden_size_list=[128, 128, 256], encoder_norm_type=None)
¶
Overview
Initialize the GTrXLDQN model accoding to corresponding input arguments.
.. tip::
You can refer to GTrXl class in ding.torch_utils.network.gtrxl for more details about the input arguments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
- obs_shape (
|
obj: |
required | |
- action_shape (
|
obj:Union[int, SequenceType]): Used by Head. Action's space. |
required | |
- head_layer_num (
|
obj: |
required | |
- att_head_dim (
|
obj: |
required | |
- hidden_size (
|
obj: |
required | |
- att_head_num (
|
obj: |
required | |
- att_mlp_num (
|
obj: |
required | |
- att_layer_num (
|
obj: |
required | |
- memory_len (
|
obj: |
required | |
- activation (
|
obj: |
required | |
- head_norm_type (
|
obj: |
required | |
- dropout (
|
obj: |
required | |
- gru_gating (
|
obj: |
required | |
- gru_bias (
|
obj: |
required | |
- dueling (
|
obj: |
required | |
- encoder_hidden_size_list(
|
obj: |
required | |
- encoder_norm_type (
|
obj: |
required |
forward(x)
¶
Overview
Let input tensor go through GTrXl and the Head sequentially.
Arguments:
- x (:obj:torch.Tensor): input tensor of shape (seq_len, bs, obs_shape).
Returns:
- out (:obj:Dict): run GTrXL with DiscreteHead setups and return the result prediction dictionary.
ReturnKeys:
- logit (:obj:torch.Tensor): discrete Q-value output of each action dimension, shape is (B, action_space).
- memory (:obj:torch.Tensor): memory tensor of size (bs x layer_num+1 x memory_len x embedding_dim).
- transformer_out (:obj:torch.Tensor): output tensor of transformer with same size as input x.
Examples:
>>> # Init input's Keys:
>>> obs_dim, seq_len, bs, action_dim = 128, 64, 32, 4
>>> obs = torch.rand(seq_len, bs, obs_dim)
>>> model = GTrXLDQN(obs_dim, action_dim)
>>> outputs = model(obs)
>>> assert isinstance(outputs, dict)
reset_memory(batch_size=None, state=None)
¶
Overview
Clear or reset the memory of GTrXL.
Arguments:
- batch_size (:obj:Optional[int]): The number of samples in a training batch.
- state (:obj:Optional[torch.Tensor]): The input memory data, whose shape is (layer_num, memory_len, bs, embedding_dim).
get_memory()
¶
Overview
Return the memory of GTrXL.
Returns:
- memory: (:obj:Optional[torch.Tensor]): output memory or None if memory has not been initialized, whose shape is (layer_num, memory_len, bs, embedding_dim).
DiscreteQAC
¶
Bases: Module
Overview
The neural network and computation graph of algorithms related to discrete action Q-value Actor-Critic (QAC), such as DiscreteSAC. This model now supports only discrete action space. The DiscreteQAC is composed of four parts: actor_encoder, critic_encoder, actor_head and critic_head. Encoders are used to extract the feature from various observation. Heads are used to predict corresponding Q-value or action logit. In high-dimensional observation space like 2D image, we often use a shared encoder for both actor_encoder and critic_encoder. In low-dimensional observation space like 1D vector, we often use different encoders.
Interfaces:
__init__, forward, compute_actor, compute_critic
__init__(obs_shape, action_shape, twin_critic=False, actor_head_hidden_size=64, actor_head_layer_num=1, critic_head_hidden_size=64, critic_head_layer_num=1, activation=nn.ReLU(), norm_type=None, encoder_hidden_size_list=None, share_encoder=False)
¶
Overview
Initailize the DiscreteQAC Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's shape, such as 128, (156, ).
- action_shape (:obj:Union[int, SequenceType, EasyDict]): Action's shape, such as 4, (3, ).
- twin_critic (:obj:bool): Whether to use twin critic.
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor head.
- actor_head_layer_num (:obj:int): The num of layers used in the actor network to compute action.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic head.
- critic_head_layer_num (:obj:int): The num of layers used in the critic network to compute Q-value.
- activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP after each FC layer, if None then default set to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization to after network layer (FC, Conv), see ding.torch_utils.network for more details.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size, this argument is only used in image observation.
- share_encoder (:obj:Optional[bool]): Whether to share encoder between actor and critic.
forward(inputs, mode)
¶
Overview
QAC forward computation graph, input observation tensor to predict Q-value or action logit. Different mode will forward with different network modules to get different outputs and save computation.
Arguments:
- inputs (:obj:torch.Tensor): The input observation tensor data.
- mode (:obj:str): The forward mode, all the modes are defined in the beginning of this class.
Returns:
- output (:obj:Dict[str, torch.Tensor]): The output dict of QAC forward computation graph, whose key-values vary in different forward modes.
Examples (Actor):
>>> model = DiscreteQAC(64, 6)
>>> obs = torch.randn(4, 64)
>>> actor_outputs = model(obs,'compute_actor')
>>> assert actor_outputs['logit'].shape == torch.Size([4, 6])
Examples(Critic): >>> model = DiscreteQAC(64, 6, twin_critic=False) >>> obs = torch.randn(4, 64) >>> actor_outputs = model(obs,'compute_critic') >>> assert actor_outputs['q_value'].shape == torch.Size([4, 6])
compute_actor(inputs)
¶
Overview
QAC forward computation graph for actor part, input observation tensor to predict action or action logit.
Arguments:
- inputs (:obj:torch.Tensor): The input observation tensor data.
Returns:
- outputs (:obj:Dict[str, torch.Tensor]): The output dict of QAC forward computation graph for actor, including discrete action logit.
ReturnsKeys:
- logit (:obj:torch.Tensor): The predicted discrete action type logit, it will be the same dimension as action_shape, i.e., all the possible discrete action choices.
Shapes:
- inputs (:obj:torch.Tensor): :math:(B, N0), B is batch size and N0 corresponds to obs_shape.
- logit (:obj:torch.Tensor): :math:(B, N2), B is batch size and N2 corresponds to action_shape.
Examples:
>>> model = DiscreteQAC(64, 6)
>>> obs = torch.randn(4, 64)
>>> actor_outputs = model(obs,'compute_actor')
>>> assert actor_outputs['logit'].shape == torch.Size([4, 6])
compute_critic(inputs)
¶
Overview
QAC forward computation graph for critic part, input observation to predict Q-value for each possible discrete action choices.
Arguments:
- inputs (:obj:torch.Tensor): The input observation tensor data.
Returns:
- outputs (:obj:Dict[str, torch.Tensor]): The output dict of QAC forward computation graph for critic, including q_value for each possible discrete action choices.
ReturnKeys:
- q_value (:obj:torch.Tensor): The predicted Q-value for each possible discrete action choices, it will be the same dimension as action_shape and used to calculate the loss.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, N1), where B is batch size and N1 is obs_shape.
- q_value (:obj:torch.Tensor): :math:(B, N2), where B is batch size and N2 is action_shape.
Examples:
>>> model = DiscreteQAC(64, 6, twin_critic=False)
>>> obs = torch.randn(4, 64)
>>> actor_outputs = model(obs,'compute_critic')
>>> assert actor_outputs['q_value'].shape == torch.Size([4, 6])
ContinuousQAC
¶
Bases: Module
Overview
The neural network and computation graph of algorithms related to Q-value Actor-Critic (QAC), such as DDPG/TD3/SAC. This model now supports continuous and hybrid action space. The ContinuousQAC is composed of four parts: actor_encoder, critic_encoder, actor_head and critic_head. Encoders are used to extract the feature from various observation. Heads are used to predict corresponding Q-value or action logit. In high-dimensional observation space like 2D image, we often use a shared encoder for both actor_encoder and critic_encoder. In low-dimensional observation space like 1D vector, we often use different encoders.
Interfaces:
__init__, forward, compute_actor, compute_critic
__init__(obs_shape, action_shape, action_space, twin_critic=False, actor_head_hidden_size=64, actor_head_layer_num=1, critic_head_hidden_size=64, critic_head_layer_num=1, activation=nn.ReLU(), norm_type=None, encoder_hidden_size_list=None, share_encoder=False)
¶
Overview
Initailize the ContinuousQAC Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's shape, such as 128, (156, ).
- action_shape (:obj:Union[int, SequenceType, EasyDict]): Action's shape, such as 4, (3, ), EasyDict({'action_type_shape': 3, 'action_args_shape': 4}).
- action_space (:obj:str): The type of action space, including [regression, reparameterization, hybrid], regression is used for DDPG/TD3, reparameterization is used for SAC and hybrid for PADDPG.
- twin_critic (:obj:bool): Whether to use twin critic, one of tricks in TD3.
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor head.
- actor_head_layer_num (:obj:int): The num of layers used in the actor network to compute action.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic head.
- critic_head_layer_num (:obj:int): The num of layers used in the critic network to compute Q-value.
- activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP after each FC layer, if None then default set to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization to after network layer (FC, Conv), see ding.torch_utils.network for more details.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size, this argument is only used in image observation.
- share_encoder (:obj:Optional[bool]): Whether to share encoder between actor and critic.
forward(inputs, mode)
¶
Overview
QAC forward computation graph, input observation tensor to predict Q-value or action logit. Different mode will forward with different network modules to get different outputs and save computation.
Arguments:
- inputs (:obj:Union[torch.Tensor, Dict[str, torch.Tensor]]): The input data for forward computation graph, for compute_actor, it is the observation tensor, for compute_critic, it is the dict data including obs and action tensor.
- mode (:obj:str): The forward mode, all the modes are defined in the beginning of this class.
Returns:
- output (:obj:Dict[str, torch.Tensor]): The output dict of QAC forward computation graph, whose key-values vary in different forward modes.
Examples (Actor):
>>> # Regression mode
>>> model = ContinuousQAC(64, 6, 'regression')
>>> obs = torch.randn(4, 64)
>>> actor_outputs = model(obs,'compute_actor')
>>> assert actor_outputs['action'].shape == torch.Size([4, 6])
>>> # Reparameterization Mode
>>> model = ContinuousQAC(64, 6, 'reparameterization')
>>> obs = torch.randn(4, 64)
>>> actor_outputs = model(obs,'compute_actor')
>>> assert actor_outputs['logit'][0].shape == torch.Size([4, 6]) # mu
>>> actor_outputs['logit'][1].shape == torch.Size([4, 6]) # sigma
Examples (Critic): >>> inputs = {'obs': torch.randn(4, 8), 'action': torch.randn(4, 1)} >>> model = ContinuousQAC(obs_shape=(8, ),action_shape=1, action_space='regression') >>> assert model(inputs, mode='compute_critic')['q_value'].shape == (4, ) # q value
compute_actor(obs)
¶
Overview
QAC forward computation graph for actor part, input observation tensor to predict action or action logit.
Arguments:
- x (:obj:torch.Tensor): The input observation tensor data.
Returns:
- outputs (:obj:Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]]): Actor output dict varying from action_space: regression, reparameterization, hybrid.
ReturnsKeys (regression):
- action (:obj:torch.Tensor): Continuous action with same size as action_shape, usually in DDPG/TD3.
ReturnsKeys (reparameterization):
- logit (:obj:Dict[str, torch.Tensor]): The predictd reparameterization action logit, usually in SAC. It is a list containing two tensors: mu and sigma. The former is the mean of the gaussian distribution, the latter is the standard deviation of the gaussian distribution.
ReturnsKeys (hybrid):
- logit (:obj:torch.Tensor): The predicted discrete action type logit, it will be the same dimension as action_type_shape, i.e., all the possible discrete action types.
- action_args (:obj:torch.Tensor): Continuous action arguments with same size as action_args_shape.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, N0), B is batch size and N0 corresponds to obs_shape.
- action (:obj:torch.Tensor): :math:(B, N1), B is batch size and N1 corresponds to action_shape.
- logit.mu (:obj:torch.Tensor): :math:(B, N1), B is batch size and N1 corresponds to action_shape.
- logit.sigma (:obj:torch.Tensor): :math:(B, N1), B is batch size.
- logit (:obj:torch.Tensor): :math:(B, N2), B is batch size and N2 corresponds to action_shape.action_type_shape.
- action_args (:obj:torch.Tensor): :math:(B, N3), B is batch size and N3 corresponds to action_shape.action_args_shape.
Examples:
>>> # Regression mode
>>> model = ContinuousQAC(64, 6, 'regression')
>>> obs = torch.randn(4, 64)
>>> actor_outputs = model(obs,'compute_actor')
>>> assert actor_outputs['action'].shape == torch.Size([4, 6])
>>> # Reparameterization Mode
>>> model = ContinuousQAC(64, 6, 'reparameterization')
>>> obs = torch.randn(4, 64)
>>> actor_outputs = model(obs,'compute_actor')
>>> assert actor_outputs['logit'][0].shape == torch.Size([4, 6]) # mu
>>> actor_outputs['logit'][1].shape == torch.Size([4, 6]) # sigma
compute_critic(inputs)
¶
Overview
QAC forward computation graph for critic part, input observation and action tensor to predict Q-value.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): The dict of input data, including obs and action tensor, also contains logit and action_args tensor in hybrid action_space.
ArgumentsKeys:
- obs: (:obj:torch.Tensor): Observation tensor data, now supports a batch of 1-dim vector data.
- action (:obj:Union[torch.Tensor, Dict]): Continuous action with same size as action_shape.
- logit (:obj:torch.Tensor): Discrete action logit, only in hybrid action_space.
- action_args (:obj:torch.Tensor): Continuous action arguments, only in hybrid action_space.
Returns:
- outputs (:obj:Dict[str, torch.Tensor]): The output dict of QAC's forward computation graph for critic, including q_value.
ReturnKeys:
- q_value (:obj:torch.Tensor): Q value tensor with same size as batch size.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, N1), where B is batch size and N1 is obs_shape.
- logit (:obj:torch.Tensor): :math:(B, N2), B is batch size and N2 corresponds to action_shape.action_type_shape.
- action_args (:obj:torch.Tensor): :math:(B, N3), B is batch size and N3 corresponds to action_shape.action_args_shape.
- action (:obj:torch.Tensor): :math:(B, N4), where B is batch size and N4 is action_shape.
- q_value (:obj:torch.Tensor): :math:(B, ), where B is batch size.
Examples:
>>> inputs = {'obs': torch.randn(4, 8), 'action': torch.randn(4, 1)}
>>> model = ContinuousQAC(obs_shape=(8, ),action_shape=1, action_space='regression')
>>> assert model(inputs, mode='compute_critic')['q_value'].shape == (4, ) # q value
PDQN
¶
Bases: Module
Overview
The neural network and computation graph of PDQN(https://arxiv.org/abs/1810.06394v1) and MPDQN(https://arxiv.org/abs/1905.04388) algorithms for parameterized action space. This model supports parameterized action space with discrete action_type and continuous action_arg. In principle, PDQN consists of x network (continuous action parameter network) and Q network (discrete action type network). But for simplicity, the code is split into encoder and actor_head, which contain the encoder and head of the above two networks respectively.
Interface:
__init__, forward, compute_discrete, compute_continuous.
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], dueling=True, head_hidden_size=None, head_layer_num=1, activation=nn.ReLU(), norm_type=None, multi_pass=False, action_mask=None)
¶
Overview
Init the PDQN (encoder + head) Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].
- action_shape (:obj:EasyDict): Action space shape in dict type, such as EasyDict({'action_type_shape': 3, 'action_args_shape': 5}).
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size.
- dueling (:obj:dueling): Whether choose DuelingHead or DiscreteHead(default).
- head_hidden_size (:obj:Optional[int]): The hidden_size of head network.
- head_layer_num (:obj:int): The number of layers used in the head network to compute Q value output.
- activation (:obj:Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details.
- multi_pass (:obj:Optional[bool]): Whether to use multi pass version.
- action_mask: (:obj:Optional[list]): An action mask indicating how action args are associated to each discrete action. For example, if there are 3 discrete action, 4 continous action args, and the first discrete action associates with the first continuous action args, the second discrete action associates with the second continuous action args, and the third discrete action associates with the remaining 2 action args, the action mask will be like: [[1,0,0,0],[0,1,0,0],[0,0,1,1]] with shape 3*4.
forward(inputs, mode)
¶
Overview
PDQN forward computation graph, input observation tensor to predict q_value for discrete actions and values for continuous action_args.
Arguments:
- inputs (:obj:Union[torch.Tensor, Dict, EasyDict]): Inputs including observation and other info according to mode.
- mode (:obj:str): Name of the forward mode.
Shapes:
- inputs (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is obs_shape.
compute_continuous(inputs)
¶
Overview
Use observation tensor to predict continuous action args.
Arguments:
- inputs (:obj:torch.Tensor): Observation inputs.
Returns:
- outputs (:obj:Dict): A dict with key 'action_args'.
- 'action_args' (:obj:torch.Tensor): The continuous action args.
Shapes:
- inputs (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is obs_shape.
- action_args (:obj:torch.Tensor): :math:(B, M), where M is action_args_shape.
Examples:
>>> act_shape = EasyDict({'action_type_shape': (3, ), 'action_args_shape': (5, )})
>>> model = PDQN(4, act_shape)
>>> inputs = torch.randn(64, 4)
>>> outputs = model.forward(inputs, mode='compute_continuous')
>>> assert outputs['action_args'].shape == torch.Size([64, 5])
compute_discrete(inputs)
¶
Overview
Use observation tensor and continuous action args to predict discrete action types.
Arguments:
- inputs (:obj:Union[Dict, EasyDict]): A dict with keys 'state', 'action_args'.
- state (:obj:torch.Tensor): Observation inputs.
- action_args (:obj:torch.Tensor): Action parameters are used to concatenate with the observation and serve as input to the discrete action type network.
Returns:
- outputs (:obj:Dict): A dict with keys 'logit', 'action_args'.
- 'logit': The logit value for each discrete action.
- 'action_args': The continuous action args(same as the inputs['action_args']) for later usage.
Examples:
>>> act_shape = EasyDict({'action_type_shape': (3, ), 'action_args_shape': (5, )})
>>> model = PDQN(4, act_shape)
>>> inputs = {'state': torch.randn(64, 4), 'action_args': torch.randn(64, 5)}
>>> outputs = model.forward(inputs, mode='compute_discrete')
>>> assert outputs['logit'].shape == torch.Size([64, 3])
>>> assert outputs['action_args'].shape == torch.Size([64, 5])
VAC
¶
Bases: Module
Overview
The neural network and computation graph of algorithms related to (state) Value Actor-Critic (VAC), such as A2C/PPO/IMPALA. This model now supports discrete, continuous and hybrid action space. The VAC is composed of four parts: actor_encoder, critic_encoder, actor_head and critic_head. Encoders are used to extract the feature from various observation. Heads are used to predict corresponding value or action logit. In high-dimensional observation space like 2D image, we often use a shared encoder for both actor_encoder and critic_encoder. In low-dimensional observation space like 1D vector, we often use different encoders.
Interfaces:
__init__, forward, compute_actor, compute_critic, compute_actor_critic.
__init__(obs_shape, action_shape, action_space='discrete', share_encoder=True, encoder_hidden_size_list=[128, 128, 64], actor_head_hidden_size=64, actor_head_layer_num=1, critic_head_hidden_size=64, critic_head_layer_num=1, activation=nn.ReLU(), norm_type=None, sigma_type='independent', fixed_sigma_value=0.3, bound_type=None, encoder=None, impala_cnn_encoder=False)
¶
Overview
Initialize the VAC model according to corresponding input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].
- action_shape (:obj:Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3].
- action_space (:obj:str): The type of different action spaces, including ['discrete', 'continuous', 'hybrid'], then will instantiate corresponding head, including DiscreteHead, ReparameterizationHead, and hybrid heads.
- share_encoder (:obj:bool): Whether to share observation encoders between actor and decoder.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element is used as the input size of actor_head and critic_head.
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size of actor_head network, defaults to 64, it is the hidden size of the last layer of the actor_head network.
- actor_head_layer_num (:obj:int): The num of layers used in the actor_head network to compute action.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size of critic_head network, defaults to 64, it is the hidden size of the last layer of the critic_head network.
- critic_head_layer_num (:obj:int): The num of layers used in the critic_head network.
- activation (:obj:Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details. you can choose one of ['BN', 'IN', 'SyncBN', 'LN']
- sigma_type (:obj:Optional[str]): The type of sigma in continuous action space, see ding.torch_utils.network.dreamer.ReparameterizationHead for more details, in A2C/PPO, it defaults to independent, which means state-independent sigma parameters.
- fixed_sigma_value (:obj:Optional[int]): If sigma_type is fixed, then use this value as sigma.
- bound_type (:obj:Optional[str]): The type of action bound methods in continuous action space, defaults to None, which means no bound.
- encoder (:obj:Optional[torch.nn.Module]): The encoder module, defaults to None, you can define your own encoder module and pass it into VAC to deal with different observation space.
- impala_cnn_encoder (:obj:bool): Whether to use IMPALA CNN encoder, defaults to False.
forward(x, mode)
¶
Overview
VAC forward computation graph, input observation tensor to predict state value or action logit. Different mode will forward with different network modules to get different outputs and save computation.
Arguments:
- x (:obj:torch.Tensor): The input observation tensor data.
- mode (:obj:str): The forward mode, all the modes are defined in the beginning of this class.
Returns:
- outputs (:obj:Dict): The output dict of VAC's forward computation graph, whose key-values vary from different mode.
Examples (Actor): >>> model = VAC(64, 128) >>> inputs = torch.randn(4, 64) >>> actor_outputs = model(inputs,'compute_actor') >>> assert actor_outputs['logit'].shape == torch.Size([4, 128])
Examples (Critic): >>> model = VAC(64, 64) >>> inputs = torch.randn(4, 64) >>> critic_outputs = model(inputs,'compute_critic') >>> assert actor_outputs['logit'].shape == torch.Size([4, 64])
Examples (Actor-Critic): >>> model = VAC(64, 64) >>> inputs = torch.randn(4, 64) >>> outputs = model(inputs,'compute_actor_critic') >>> assert critic_outputs['value'].shape == torch.Size([4]) >>> assert outputs['logit'].shape == torch.Size([4, 64])
compute_actor(x)
¶
Overview
VAC forward computation graph for actor part, input observation tensor to predict action logit.
Arguments:
- x (:obj:Union[torch.Tensor, Dict]): The input observation tensor data. If a dictionary is provided, it should contain keys 'observation' and optionally 'action_mask'.
Returns:
- outputs (:obj:Dict): The output dict of VAC's forward computation graph for actor, including logit and optionally action_mask if the input is a dictionary.
ReturnsKeys:
- logit (:obj:torch.Tensor): The predicted action logit tensor, for discrete action space, it will be the same dimension real-value ranged tensor of possible action choices, and for continuous action space, it will be the mu and sigma of the Gaussian distribution, and the number of mu and sigma is the same as the number of continuous actions. Hybrid action space is a kind of combination of discrete and continuous action space, so the logit will be a dict with action_type and action_args.
- action_mask (:obj:Optional[torch.Tensor]): The action mask tensor, included if the input is a dictionary containing 'action_mask'.
Shapes:
- logit (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is action_shape
Examples:
>>> model = VAC(64, 64)
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> assert actor_outputs['logit'].shape == torch.Size([4, 64])
compute_critic(x)
¶
Overview
VAC forward computation graph for critic part, input observation tensor to predict state value.
Arguments:
- x (:obj:Union[torch.Tensor, Dict]): The input observation tensor data. If a dictionary is provided, it should contain the key 'observation'.
Returns:
- outputs (:obj:Dict): The output dict of VAC's forward computation graph for critic, including value.
ReturnsKeys:
- value (:obj:torch.Tensor): The predicted state value tensor.
Shapes:
- value (:obj:torch.Tensor): :math:(B, ), where B is batch size, (B, 1) is squeezed to (B, ).
Examples:
>>> model = VAC(64, 64)
>>> inputs = torch.randn(4, 64)
>>> critic_outputs = model(inputs,'compute_critic')
>>> assert critic_outputs['value'].shape == torch.Size([4])
compute_actor_critic(x)
¶
Overview
VAC forward computation graph for both actor and critic part, input observation tensor to predict action logit and state value.
Arguments:
- x (:obj:Union[torch.Tensor, Dict]): The input observation tensor data. If a dictionary is provided, it should contain keys 'observation' and optionally 'action_mask'.
Returns:
- outputs (:obj:Dict): The output dict of VAC's forward computation graph for both actor and critic, including logit, value, and optionally action_mask if the input is a dictionary.
ReturnsKeys:
- logit (:obj:torch.Tensor): The predicted action logit tensor, for discrete action space, it will be the same dimension real-value ranged tensor of possible action choices, and for continuous action space, it will be the mu and sigma of the Gaussian distribution, and the number of mu and sigma is the same as the number of continuous actions. Hybrid action space is a kind of combination of discrete and continuous action space, so the logit will be a dict with action_type and action_args.
- value (:obj:torch.Tensor): The predicted state value tensor.
- action_mask (:obj:torch.Tensor, optional): The action mask tensor, included if the input is a dictionary containing 'action_mask'.
Shapes:
- logit (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is action_shape
- value (:obj:torch.Tensor): :math:(B, ), where B is batch size, (B, 1) is squeezed to (B, ).
Examples:
>>> model = VAC(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs,'compute_actor_critic')
>>> assert critic_outputs['value'].shape == torch.Size([4])
>>> assert outputs['logit'].shape == torch.Size([4, 64])
.. note::
compute_actor_critic interface aims to save computation when shares encoder and return the combination dict output.
DREAMERVAC
¶
Bases: Module
Overview
The neural network and computation graph of DreamerV3 (state) Value Actor-Critic (VAC). This model now supports discrete, continuous action space.
Interfaces:
__init__, forward.
__init__(action_shape, dyn_stoch=32, dyn_deter=512, dyn_discrete=32, actor_layers=2, value_layers=2, units=512, act='SiLU', norm='LayerNorm', actor_dist='normal', actor_init_std=1.0, actor_min_std=0.1, actor_max_std=1.0, actor_temp=0.1, action_unimix_ratio=0.01)
¶
Overview
Initialize the DREAMERVAC model according to arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].
- action_shape (:obj:Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3].
DiscreteBC
¶
Bases: Module
Overview
The DiscreteBC network.
Interfaces:
__init__, forward
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], dueling=True, head_hidden_size=None, head_layer_num=1, activation=nn.ReLU(), norm_type=None, strides=None)
¶
Overview
Init the DiscreteBC (encoder + head) Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].
- action_shape (:obj:Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3].
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size.
- dueling (:obj:dueling): Whether choose DuelingHead or DiscreteHead(default).
- head_hidden_size (:obj:Optional[int]): The hidden_size of head network.
- head_layer_num (:obj:int): The number of layers used in the head network to compute Q value output
- activation (:obj:Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details.
- strides (:obj:Optional[list]): The strides for each convolution layers, such as [2, 2, 2]. The length of this argument should be the same as encoder_hidden_size_list.
forward(x)
¶
Overview
DiscreteBC forward computation graph, input observation tensor to predict q_value.
Arguments:
- x (:obj:torch.Tensor): Observation inputs
Returns:
- outputs (:obj:Dict): DiscreteBC forward outputs, such as q_value.
ReturnsKeys:
- logit (:obj:torch.Tensor): Discrete Q-value output of each action dimension.
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is obs_shape
- logit (:obj:torch.FloatTensor): :math:(B, M), where B is batch size and M is action_shape
Examples:
>>> model = DiscreteBC(32, 6) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 32)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 6])
ContinuousBC
¶
Bases: Module
Overview
The ContinuousBC network.
Interfaces:
__init__, forward
__init__(obs_shape, action_shape, action_space, actor_head_hidden_size=64, actor_head_layer_num=1, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the ContinuousBC Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's shape, such as 128, (156, ).
- action_shape (:obj:Union[int, SequenceType, EasyDict]): Action's shape, such as 4, (3, ), EasyDict({'action_type_shape': 3, 'action_args_shape': 4}).
- action_space (:obj:str): The type of action space, including [regression, reparameterization].
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor head.
- actor_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for actor head.
- activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP after each FC layer, if None then default set to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization to after network layer (FC, Conv), see ding.torch_utils.network for more details.
forward(inputs)
¶
Overview
The unique execution (forward) method of ContinuousBC.
Arguments:
- inputs (:obj:torch.Tensor): Observation data, defaults to tensor.
Returns:
- output (:obj:Dict): Output dict data, including different key-values among distinct action_space.
ReturnsKeys:
- action (:obj:torch.Tensor): action output of actor network, with shape :math:(B, action_shape).
- logit (:obj:List[torch.Tensor]): reparameterized action output of actor network, with shape :math:(B, action_shape).
Shapes:
- inputs (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is obs_shape
- action (:obj:torch.FloatTensor): :math:(B, M), where B is batch size and M is action_shape
- logit (:obj:List[torch.FloatTensor]): :math:(B, M), where B is batch size and M is action_shape
Examples (Regression):
>>> model = ContinuousBC(32, 6, action_space='regression')
>>> inputs = torch.randn(4, 32)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict) and outputs['action'].shape == torch.Size([4, 6])
Examples (Reparameterization):
>>> model = ContinuousBC(32, 6, action_space='reparameterization')
>>> inputs = torch.randn(4, 32)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'][0].shape == torch.Size([4, 6])
>>> assert outputs['logit'][1].shape == torch.Size([4, 6])
LanguageTransformer
¶
Bases: Module
Overview
The LanguageTransformer network. Download a pre-trained language model and add head on it. In the default case, we use BERT model as the text encoder, whose bi-directional character is good for obtaining the embedding of the whole sentence.
Interfaces:
__init__, forward
__init__(model_name='bert-base-uncased', add_linear=False, embedding_size=128, freeze_encoder=True, hidden_dim=768, norm_embedding=False)
¶
Overview
Init the LanguageTransformer Model according to input arguments.
Arguments:
- model_name (:obj:str): The base language model name in huggingface, such as "bert-base-uncased".
- add_linear (:obj:bool): Whether to add a linear layer on the top of language model, defaults to be False.
- embedding_size (:obj:int): The embedding size of the added linear layer, such as 128.
- freeze_encoder (:obj:bool): Whether to freeze the encoder language model while training, defaults to be True.
- hidden_dim (:obj:int): The embedding dimension of the encoding model (e.g. BERT). This value should correspond to the model you use. For bert-base-uncased, this value is 768.
- norm_embedding (:obj:bool): Whether to normalize the embedding vectors. Default to be False.
forward(train_samples, candidate_samples=None, mode='compute_actor')
¶
Overview
LanguageTransformer forward computation graph, input two lists of strings and predict their matching scores.
Different mode will forward with different network modules to get different outputs.
Arguments:
- train_samples (:obj:List[str]): One list of strings.
- candidate_samples (:obj:Optional[List[str]]): The other list of strings to calculate matching scores.
- - mode (:obj:str): The forward mode, all the modes are defined in the beginning of this class.
Returns:
- output (:obj:Dict): Output dict data, including the logit of matching scores and the corresponding torch.distributions.Categorical object.
Examples:
>>> test_pids = [1]
>>> cand_pids = [0, 2, 4]
>>> problems = [ "This is problem 0", "This is the first question", "Second problem is here", "Another problem", "This is the last problem" ]
>>> ctxt_list = [problems[pid] for pid in test_pids]
>>> cands_list = [problems[pid] for pid in cand_pids]
>>> model = LanguageTransformer(model_name="bert-base-uncased", add_linear=True, embedding_size=256)
>>> scores = model(ctxt_list, cands_list)
>>> assert scores.shape == (1, 3)
PG
¶
Bases: Module
Overview
The neural network and computation graph of algorithms related to Policy Gradient(PG) (https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf). The PG model is composed of two parts: encoder and head. Encoders are used to extract the feature from various observation. Heads are used to predict corresponding action logit.
Interface:
__init__, forward.
__init__(obs_shape, action_shape, action_space='discrete', encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, head_layer_num=1, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the PG model according to corresponding input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].
- action_shape (:obj:Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3].
- action_space (:obj:str): The type of different action spaces, including ['discrete', 'continuous'], then will instantiate corresponding head, including DiscreteHead and ReparameterizationHead.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size.
- head_hidden_size (:obj:Optional[int]): The hidden_size of head network, defaults to None, it must match the last element of encoder_hidden_size_list.
- head_layer_num (:obj:int): The num of layers used in the head network to compute action.
- activation (:obj:Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details. you can choose one of ['BN', 'IN', 'SyncBN', 'LN']
Examples:
>>> model = PG((4, 84, 84), 5)
>>> inputs = torch.randn(8, 4, 84, 84)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == (8, 5)
>>> assert outputs['dist'].sample().shape == (8, )
forward(x)
¶
Overview
PG forward computation graph, input observation tensor to predict policy distribution.
Arguments:
- x (:obj:torch.Tensor): The input observation tensor data.
Returns:
- outputs (:obj:torch.distributions): The output policy distribution. If action space is discrete, the output is Categorical distribution; if action space is continuous, the output is Normal distribution.
PPG
¶
Bases: Module
Overview
Phasic Policy Gradient (PPG) model from paper Phasic Policy Gradient
https://arxiv.org/abs/2009.04416 This module contains VAC module and an auxiliary critic module.
Interfaces:
forward, compute_actor, compute_critic, compute_actor_critic
__init__(obs_shape, action_shape, action_space='discrete', share_encoder=True, encoder_hidden_size_list=[128, 128, 64], actor_head_hidden_size=64, actor_head_layer_num=2, critic_head_hidden_size=64, critic_head_layer_num=1, activation=nn.ReLU(), norm_type=None, impala_cnn_encoder=False)
¶
Overview
Initailize the PPG Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's shape, such as 128, (156, ).
- action_shape (:obj:Union[int, SequenceType]): Action's shape, such as 4, (3, ).
- action_space (:obj:str): The action space type, such as 'discrete', 'continuous'.
- share_encoder (:obj:bool): Whether to share encoder.
- encoder_hidden_size_list (:obj:SequenceType): The hidden size list of encoder.
- actor_head_hidden_size (:obj:int): The hidden_size to pass to actor head.
- actor_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for actor head.
- critic_head_hidden_size (:obj:int): The hidden_size to pass to critic head.
- critic_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for critic head.
- activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP after each FC layer, if None then default set to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization to after network layer (FC, Conv), see ding.torch_utils.network for more details.
- impala_cnn_encoder (:obj:bool): Whether to use impala cnn encoder.
forward(inputs, mode)
¶
Overview
Compute action logits or value according to mode being compute_actor, compute_critic or compute_actor_critic.
Arguments:
- x (:obj:torch.Tensor): The input observation tensor data.
- mode (:obj:str): The forward mode, all the modes are defined in the beginning of this class.
Returns:
- outputs (:obj:Dict): The output dict of PPG's forward computation graph, whose key-values vary from different mode.
compute_actor(x)
¶
Overview
Use actor to compute action logits.
Arguments:
- x (:obj:torch.Tensor): The input observation tensor data.
Returns:
- output (:obj:Dict): The output data containing action logits.
ReturnsKeys:
- logit (:obj:torch.Tensor): The predicted action logit tensor, for discrete action space, it will be the same dimension real-value ranged tensor of possible action choices, and for continuous action space, it will be the mu and sigma of the Gaussian distribution, and the number of mu and sigma is the same as the number of continuous actions. Hybrid action space is a kind of combination of discrete and continuous action space, so the logit will be a dict with action_type and action_args.
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is the input feature size.
- output (:obj:Dict): logit: :math:(B, A), where B is batch size and A is the action space size.
compute_critic(x)
¶
Overview
Use critic to compute value.
Arguments:
- x (:obj:torch.Tensor): The input observation tensor data.
Returns:
- output (:obj:Dict): The output dict of VAC's forward computation graph for critic, including value.
ReturnsKeys:
- necessary: value
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is the input feature size.
- output (:obj:Dict): value: :math:(B, 1), where B is batch size.
compute_actor_critic(x)
¶
Overview
Use actor and critic to compute action logits and value.
Arguments:
- x (:obj:torch.Tensor): The input observation tensor data.
Returns:
- outputs (:obj:Dict): The output dict of PPG's forward computation graph for both actor and critic, including logit and value.
ReturnsKeys:
- logit (:obj:torch.Tensor): The predicted action logit tensor, for discrete action space, it will be the same dimension real-value ranged tensor of possible action choices, and for continuous action space, it will be the mu and sigma of the Gaussian distribution, and the number of mu and sigma is the same as the number of continuous actions. Hybrid action space is a kind of combination of discrete and continuous action space, so the logit will be a dict with action_type and action_args.
- value (:obj:torch.Tensor): The predicted state value tensor.
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is the input feature size.
- output (:obj:Dict): value: :math:(B, 1), where B is batch size.
- output (:obj:Dict): logit: :math:(B, A), where B is batch size and A is the action space size.
.. note::
compute_actor_critic interface aims to save computation when shares encoder.
Mixer
¶
Bases: Module
Overview
Mixer network in QMIX, which mix up the independent q_value of each agent to a total q_value. The weights (but not the biases) of the Mixer network are restricted to be non-negative and produced by separate hypernetworks. Each hypernetwork takes the globle state s as input and generates the weights of one layer of the Mixer network.
Interface:
__init__, forward.
__init__(agent_num, state_dim, mixing_embed_dim, hypernet_embed=64, activation=nn.ReLU())
¶
Overview
Initialize mixer network proposed in QMIX according to arguments. Each hypernetwork consists of linear layers, followed by an absolute activation function, to ensure that the Mixer network weights are non-negative.
Arguments:
- agent_num (:obj:int): The number of agent, such as 8.
- state_dim(:obj:int): The dimension of global observation state, such as 16.
- mixing_embed_dim (:obj:int): The dimension of mixing state emdedding, such as 128.
- hypernet_embed (:obj:int): The dimension of hypernet emdedding, default to 64.
- activation (:obj:nn.Module): Activation function in network, defaults to nn.ReLU().
forward(agent_qs, states)
¶
Overview
Forward computation graph of pymarl mixer network. Mix up the input independent q_value of each agent to a total q_value with weights generated by hypernetwork according to global states.
Arguments:
- agent_qs (:obj:torch.FloatTensor): The independent q_value of each agent.
- states (:obj:torch.FloatTensor): The emdedding vector of global state.
Returns:
- q_tot (:obj:torch.FloatTensor): The total mixed q_value.
Shapes:
- agent_qs (:obj:torch.FloatTensor): :math:(B, N), where B is batch size and N is agent_num.
- states (:obj:torch.FloatTensor): :math:(B, M), where M is embedding_size.
- q_tot (:obj:torch.FloatTensor): :math:(B, ).
QMix
¶
Bases: Module
Overview
The neural network and computation graph of algorithms related to QMIX(https://arxiv.org/abs/1803.11485). The QMIX is composed of two parts: agent Q network and mixer(optional). The QMIX paper mentions that all agents share local Q network parameters, so only one Q network is initialized here. Then use summation or Mixer network to process the local Q according to the mixer settings to obtain the global Q.
Interface:
__init__, forward.
__init__(agent_num, obs_shape, global_obs_shape, action_shape, hidden_size_list, mixer=True, lstm_type='gru', activation=nn.ReLU(), dueling=False)
¶
Overview
Initialize QMIX neural network according to arguments, i.e. agent Q network and mixer.
Arguments:
- agent_num (:obj:int): The number of agent, such as 8.
- obs_shape (:obj:int): The dimension of each agent's observation state, such as 8 or [4, 84, 84].
- global_obs_shape (:obj:int): The dimension of global observation state, such as 8 or [4, 84, 84].
- action_shape (:obj:int): The dimension of action shape, such as 6 or [2, 3, 3].
- hidden_size_list (:obj:list): The list of hidden size for q_network, the last element must match mixer's mixing_embed_dim.
- mixer (:obj:bool): Use mixer net or not, default to True. If it is false, the final local Q is added to obtain the global Q.
- lstm_type (:obj:str): The type of RNN module in q_network, now support ['normal', 'pytorch', 'gru'], default to gru.
- activation (:obj:nn.Module): The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU().
- dueling (:obj:bool): Whether choose DuelingHead (True) or DiscreteHead (False), default to False.
forward(data, single_step=True)
¶
Overview
QMIX forward computation graph, input dict including time series observation and related data to predict total q_value and each agent q_value.
Arguments:
- data (:obj:dict): Input data dict with keys ['obs', 'prev_state', 'action'].
- agent_state (:obj:torch.Tensor): Time series local observation data of each agents.
- global_state (:obj:torch.Tensor): Time series global observation data.
- prev_state (:obj:list): Previous rnn state for q_network.
- action (:obj:torch.Tensor or None): The actions of each agent given outside the function. If action is None, use argmax q_value index as action to calculate agent_q_act.
- single_step (:obj:bool): Whether single_step forward, if so, add timestep dim before forward and remove it after forward.
Returns:
- ret (:obj:dict): Output data dict with keys [total_q, logit, next_state].
ReturnsKeys:
- total_q (:obj:torch.Tensor): Total q_value, which is the result of mixer network.
- agent_q (:obj:torch.Tensor): Each agent q_value.
- next_state (:obj:list): Next rnn state for q_network.
Shapes:
- agent_state (:obj:torch.Tensor): :math:(T, B, A, N), where T is timestep, B is batch_size A is agent_num, N is obs_shape.
- global_state (:obj:torch.Tensor): :math:(T, B, M), where M is global_obs_shape.
- prev_state (:obj:list): math:(B, A), a list of length B, and each element is a list of length A.
- action (:obj:torch.Tensor): :math:(T, B, A).
- total_q (:obj:torch.Tensor): :math:(T, B).
- agent_q (:obj:torch.Tensor): :math:(T, B, A, P), where P is action_shape.
- next_state (:obj:list): math:(B, A), a list of length B, and each element is a list of length A.
CollaQ
¶
Bases: Module
Overview
The network of CollaQ (Collaborative Q-learning) algorithm. It includes two parts: q_network and q_alone_network. The q_network is used to get the q_value of the agent's observation and the agent's part of the observation information of the agent's concerned allies. The q_alone_network is used to get the q_value of the agent's observation and the agent's observation information without the agent's concerned allies. Multi-Agent Collaboration via Reward Attribution Decomposition https://arxiv.org/abs/2010.08531
Interface:
__init__, forward, _setup_global_encoder
__init__(agent_num, obs_shape, alone_obs_shape, global_obs_shape, action_shape, hidden_size_list, attention=False, self_feature_range=None, ally_feature_range=None, attention_size=32, mixer=True, lstm_type='gru', activation=nn.ReLU(), dueling=False)
¶
Overview
Initialize Collaq network.
Arguments:
- agent_num (:obj:int): the number of agent
- obs_shape (:obj:int): the dimension of each agent's observation state
- alone_obs_shape (:obj:int): the dimension of each agent's observation state without other agents
- global_obs_shape (:obj:int): the dimension of global observation state
- action_shape (:obj:int): the dimension of action shape
- hidden_size_list (:obj:list): the list of hidden size
- attention (:obj:bool): use attention module or not, default to False
- self_feature_range (:obj:Union[List[int], None]): the agent's feature range
- ally_feature_range (:obj:Union[List[int], None]): the agent ally's feature range
- attention_size (:obj:int): the size of attention net layer
- mixer (:obj:bool): use mixer net or not, default to True
- lstm_type (:obj:str): use lstm or gru, default to gru
- activation (:obj:nn.Module): Activation function in network, defaults to nn.ReLU().
- dueling (:obj:bool): use dueling head or not, default to False.
forward(data, single_step=True)
¶
Overview
The forward method calculates the q_value of each agent and the total q_value of all agents. The q_value of each agent is calculated by the q_network, and the total q_value is calculated by the mixer.
Arguments:
- data (:obj:dict): input data dict with keys ['obs', 'prev_state', 'action']
- agent_state (:obj:torch.Tensor): each agent local state(obs)
- agent_alone_state (:obj:torch.Tensor): each agent's local state alone, in smac setting is without ally feature(obs_along)
- global_state (:obj:torch.Tensor): global state(obs)
- prev_state (:obj:list): previous rnn state, should include 3 parts: one hidden state of q_network, and two hidden state if q_alone_network for obs and obs_alone inputs
- action (:obj:torch.Tensor or None): if action is None, use argmax q_value index as action to calculate agent_q_act
- single_step (:obj:bool): whether single_step forward, if so, add timestep dim before forward and remove it after forward
Return:
- ret (:obj:dict): output data dict with keys ['total_q', 'logit', 'next_state']
- total_q (:obj:torch.Tensor): total q_value, which is the result of mixer network
- agent_q (:obj:torch.Tensor): each agent q_value
- next_state (:obj:list): next rnn state
Shapes:
- agent_state (:obj:torch.Tensor): :math:(T, B, A, N), where T is timestep, B is batch_size A is agent_num, N is obs_shape
- global_state (:obj:torch.Tensor): :math:(T, B, M), where M is global_obs_shape
- prev_state (:obj:list): math:(B, A), a list of length B, and each element is a list of length A
- action (:obj:torch.Tensor): :math:(T, B, A)
- total_q (:obj:torch.Tensor): :math:(T, B)
- agent_q (:obj:torch.Tensor): :math:(T, B, A, P), where P is action_shape
- next_state (:obj:list): math:(B, A), a list of length B, and each element is a list of length A
Examples:
>>> collaQ_model = CollaQ(
>>> agent_num=4,
>>> obs_shape=32,
>>> alone_obs_shape=24,
>>> global_obs_shape=32 * 4,
>>> action_shape=9,
>>> hidden_size_list=[128, 64],
>>> self_feature_range=[8, 10],
>>> ally_feature_range=[10, 16],
>>> attention_size=64,
>>> mixer=True,
>>> activation=torch.nn.Tanh()
>>> )
>>> data={
>>> 'obs': {
>>> 'agent_state': torch.randn(8, 4, 4, 32),
>>> 'agent_alone_state': torch.randn(8, 4, 4, 24),
>>> 'agent_alone_padding_state': torch.randn(8, 4, 4, 32),
>>> 'global_state': torch.randn(8, 4, 32 * 4),
>>> 'action_mask': torch.randint(0, 2, size=(8, 4, 4, 9))
>>> },
>>> 'prev_state': [[[None for _ in range(4)] for _ in range(3)] for _ in range(4)],
>>> 'action': torch.randint(0, 9, size=(8, 4, 4))
>>> }
>>> output = collaQ_model(data, single_step=False)
WQMix
¶
Bases: Module
Overview
WQMIX (https://arxiv.org/abs/2006.10800) network, There are two components: 1) Q_tot, which is same as QMIX network and composed of agent Q network and mixer network. 2) An unrestricted joint action Q_star, which is composed of agent Q network and mixer_star network. The QMIX paper mentions that all agents share local Q network parameters, so only one Q network is initialized in Q_tot or Q_star.
Interface:
__init__, forward.
__init__(agent_num, obs_shape, global_obs_shape, action_shape, hidden_size_list, lstm_type='gru', dueling=False)
¶
Overview
Initialize WQMIX neural network according to arguments, i.e. agent Q network and mixer, Q_star network and mixer_star.
Arguments:
- agent_num (:obj:int): The number of agent, such as 8.
- obs_shape (:obj:int): The dimension of each agent's observation state, such as 8.
- global_obs_shape (:obj:int): The dimension of global observation state, such as 8.
- action_shape (:obj:int): The dimension of action shape, such as 6.
- hidden_size_list (:obj:list): The list of hidden size for q_network, the last element must match mixer's mixing_embed_dim.
- lstm_type (:obj:str): The type of RNN module in q_network, now support ['normal', 'pytorch', 'gru'], default to gru.
- dueling (:obj:bool): Whether choose DuelingHead (True) or DiscreteHead (False), default to False.
forward(data, single_step=True, q_star=False)
¶
Overview
Forward computation graph of qmix network. Input dict including time series observation and related data to predict total q_value and each agent q_value. Determine whether to calculate Q_tot or Q_star based on the q_star parameter.
Arguments:
- data (:obj:dict): Input data dict with keys ['obs', 'prev_state', 'action'].
- agent_state (:obj:torch.Tensor): Time series local observation data of each agents.
- global_state (:obj:torch.Tensor): Time series global observation data.
- prev_state (:obj:list): Previous rnn state for q_network or _q_network_star.
- action (:obj:torch.Tensor or None): If action is None, use argmax q_value index as action to calculate agent_q_act.
- single_step (:obj:bool): Whether single_step forward, if so, add timestep dim before forward and remove it after forward.
- Q_star (:obj:bool): Whether Q_star network forward. If True, using the Q_star network, where the agent networks have the same architecture as Q network but do not share parameters and the mixing network is a feedforward network with 3 hidden layers of 256 dim; if False, using the Q network, same as the Q network in Qmix paper.
Returns:
- ret (:obj:dict): Output data dict with keys [total_q, logit, next_state].
- total_q (:obj:torch.Tensor): Total q_value, which is the result of mixer network.
- agent_q (:obj:torch.Tensor): Each agent q_value.
- next_state (:obj:list): Next rnn state.
Shapes:
- agent_state (:obj:torch.Tensor): :math:(T, B, A, N), where T is timestep, B is batch_size A is agent_num, N is obs_shape.
- global_state (:obj:torch.Tensor): :math:(T, B, M), where M is global_obs_shape.
- prev_state (:obj:list): math:(T, B, A), a list of length B, and each element is a list of length A.
- action (:obj:torch.Tensor): :math:(T, B, A).
- total_q (:obj:torch.Tensor): :math:(T, B).
- agent_q (:obj:torch.Tensor): :math:(T, B, A, P), where P is action_shape.
- next_state (:obj:list): math:(T, B, A), a list of length B, and each element is a list of length A.
COMA
¶
Bases: Module
Overview
The network of COMA algorithm, which is QAC-type actor-critic.
Interface:
__init__, forward
Properties:
- mode (:obj:list): The list of forward mode, including compute_actor and compute_critic
__init__(agent_num, obs_shape, action_shape, actor_hidden_size_list)
¶
Overview
initialize COMA network
Arguments:
- agent_num (:obj:int): the number of agent
- obs_shape (:obj:Dict): the observation information, including agent_state and global_state
- action_shape (:obj:Union[int, SequenceType]): the dimension of action shape
- actor_hidden_size_list (:obj:SequenceType): the list of hidden size
forward(inputs, mode)
¶
Overview
forward computation graph of COMA network
Arguments:
- inputs (:obj:dict): input data dict with keys ['obs', 'prev_state', 'action']
- agent_state (:obj:torch.Tensor): each agent local state(obs)
- global_state (:obj:torch.Tensor): global state(obs)
- action (:obj:torch.Tensor): the masked action
ArgumentsKeys:
- necessary: obs { agent_state, global_state, action_mask }, action, prev_state
ReturnsKeys:
- necessary:
- compute_critic: q_value
- compute_actor: logit, next_state, action_mask
Shapes:
- obs (:obj:dict): agent_state: :math:(T, B, A, N, D), action_mask: :math:(T, B, A, N, A)
- prev_state (:obj:list): :math:[[[h, c] for _ in range(A)] for _ in range(B)]
- logit (:obj:torch.Tensor): :math:(T, B, A, N, A)
- next_state (:obj:list): :math:[[[h, c] for _ in range(A)] for _ in range(B)]
- action_mask (:obj:torch.Tensor): :math:(T, B, A, N, A)
- q_value (:obj:torch.Tensor): :math:(T, B, A, N, A)
Examples:
>>> agent_num, bs, T = 4, 3, 8
>>> agent_num, bs, T = 4, 3, 8
>>> obs_dim, global_obs_dim, action_dim = 32, 32 * 4, 9
>>> coma_model = COMA(
>>> agent_num=agent_num,
>>> obs_shape=dict(agent_state=(obs_dim, ), global_state=(global_obs_dim, )),
>>> action_shape=action_dim,
>>> actor_hidden_size_list=[128, 64],
>>> )
>>> prev_state = [[None for _ in range(agent_num)] for _ in range(bs)]
>>> data = {
>>> 'obs': {
>>> 'agent_state': torch.randn(T, bs, agent_num, obs_dim),
>>> 'action_mask': None,
>>> },
>>> 'prev_state': prev_state,
>>> }
>>> output = coma_model(data, mode='compute_actor')
>>> data= {
>>> 'obs': {
>>> 'agent_state': torch.randn(T, bs, agent_num, obs_dim),
>>> 'global_state': torch.randn(T, bs, global_obs_dim),
>>> },
>>> 'action': torch.randint(0, action_dim, size=(T, bs, agent_num)),
>>> }
>>> output = coma_model(data, mode='compute_critic')
ATOC
¶
Bases: Module
Overview
The QAC network of ATOC, a kind of extension of DDPG for MARL. Learning Attentional Communication for Multi-Agent Cooperation https://arxiv.org/abs/1805.07733
Interface:
__init__, forward, compute_critic, compute_actor, optimize_actor_attention
__init__(obs_shape, action_shape, thought_size, n_agent, communication=True, agent_per_group=2, actor_1_embedding_size=None, actor_2_embedding_size=None, critic_head_hidden_size=64, critic_head_layer_num=2, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the ATOC QAC network
Arguments:
- obs_shape(:obj:Union[Tuple, int]): the observation space shape
- thought_size (:obj:int): the size of thoughts
- action_shape (:obj:int): the action space shape
- n_agent (:obj:int): the num of agents
- agent_per_group (:obj:int): the num of agent in each group
compute_actor(obs, get_delta_q=False)
¶
Overview
compute the action according to inputs, call the _compute_delta_q function to compute delta_q
Arguments:
- obs (:obj:torch.Tensor): observation
- get_delta_q (:obj:bool) : whether need to get delta_q
Returns:
- outputs (:obj:Dict): the output of actor network and delta_q
ReturnsKeys:
- necessary: action
- optional: group, initiator_prob, is_initiator, new_thoughts, old_thoughts, delta_q
Shapes:
- obs (:obj:torch.Tensor): :math:(B, A, N), where B is batch size, A is agent num, N is obs size
- action (:obj:torch.Tensor): :math:(B, A, M), where M is action size
- group (:obj:torch.Tensor): :math:(B, A, A)
- initiator_prob (:obj:torch.Tensor): :math:(B, A)
- is_initiator (:obj:torch.Tensor): :math:(B, A)
- new_thoughts (:obj:torch.Tensor): :math:(B, A, M)
- old_thoughts (:obj:torch.Tensor): :math:(B, A, M)
- delta_q (:obj:torch.Tensor): :math:(B, A)
Examples:
>>> net = ATOC(64, 64, 64, 3)
>>> obs = torch.randn(2, 3, 64)
>>> net.compute_actor(obs)
compute_critic(inputs)
¶
Overview
compute the q_value according to inputs
Arguments:
- inputs (:obj:Dict): the inputs contain the obs and action
Returns:
- outputs (:obj:Dict): the output of critic network
ArgumentsKeys:
- necessary: obs, action
ReturnsKeys:
- necessary: q_value
Shapes:
- obs (:obj:torch.Tensor): :math:(B, A, N), where B is batch size, A is agent num, N is obs size
- action (:obj:torch.Tensor): :math:(B, A, M), where M is action size
- q_value (:obj:torch.Tensor): :math:(B, A)
Examples:
>>> net = ATOC(64, 64, 64, 3)
>>> obs = torch.randn(2, 3, 64)
>>> action = torch.randn(2, 3, 64)
>>> net.compute_critic({'obs': obs, 'action': action})
optimize_actor_attention(inputs)
¶
Overview
return the actor attention loss
Arguments:
- inputs (:obj:Dict): the inputs contain the delta_q, initiator_prob, and is_initiator
Returns
- loss (:obj:Dict): the loss of actor attention unit
ArgumentsKeys:
- necessary: delta_q, initiator_prob, is_initiator
ReturnsKeys:
- necessary: loss
Shapes:
- delta_q (:obj:torch.Tensor): :math:(B, A)
- initiator_prob (:obj:torch.Tensor): :math:(B, A)
- is_initiator (:obj:torch.Tensor): :math:(B, A)
- loss (:obj:torch.Tensor): :math:(1)
Examples:
>>> net = ATOC(64, 64, 64, 3)
>>> delta_q = torch.randn(2, 3)
>>> initiator_prob = torch.randn(2, 3)
>>> is_initiator = torch.randn(2, 3)
>>> net.optimize_actor_attention(
>>> {'delta_q': delta_q,
>>> 'initiator_prob': initiator_prob,
>>> 'is_initiator': is_initiator})
ACER
¶
Bases: Module
Overview
The model of algorithmn ACER(Actor Critic with Experience Replay) Sample Efficient Actor-Critic with Experience Replay. https://arxiv.org/abs/1611.01224
Interfaces:
__init__, forward, compute_actor, compute_critic
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], actor_head_hidden_size=64, actor_head_layer_num=1, critic_head_hidden_size=64, critic_head_layer_num=1, activation=nn.ReLU(), norm_type=None)
¶
Overview
Init the ACER Model according to arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's space.
- action_shape (:obj:Union[int, SequenceType]): Action's space.
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor-nn's Head.
- actor_head_layer_num (:obj:int):
The num of layers used in the network to compute Q value output for actor's nn.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic-nn's Head.
- critic_head_layer_num (:obj:int):
The num of layers used in the network to compute Q value output for critic's nn.
- activation (:obj:Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn,
if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details.
forward(inputs, mode)
¶
Overview
Use observation to predict output. Parameter updates with ACER's MLPs forward setup.
Arguments:
- mode (:obj:str): Name of the forward mode.
Returns:
- outputs (:obj:Dict): Outputs of network forward.
Shapes (Actor):
- obs (:obj:torch.Tensor): :math:(B, N1), where B is batch size and N1 is obs_shape
- logit (:obj:torch.FloatTensor): :math:(B, N2), where B is batch size and N2 is action_shape
Shapes (Critic):
- inputs (:obj:torch.Tensor): :math:(B, N1), B is batch size and N1 corresponds to obs_shape
- q_value (:obj:torch.FloatTensor): :math:(B, N2), where B is batch size and N2 is action_shape
compute_actor(inputs)
¶
Overview
Use encoded embedding tensor to predict output.
Execute parameter updates with compute_actor mode
Use encoded embedding tensor to predict output.
Arguments:
- inputs (:obj:torch.Tensor):
The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size).
hidden_size = actor_head_hidden_size
- mode (:obj:str): Name of the forward mode.
Returns:
- outputs (:obj:Dict): Outputs of forward pass encoder and head.
ReturnsKeys (either):
- logit (:obj:torch.FloatTensor): :math:(B, N1), where B is batch size and N1 is action_shape
Shapes:
- inputs (:obj:torch.Tensor): :math:(B, N0), B is batch size and N0 corresponds to hidden_size
- logit (:obj:torch.FloatTensor): :math:(B, N1), where B is batch size and N1 is action_shape
Examples:
>>> # Regression mode
>>> model = ACER(64, 64)
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> assert actor_outputs['logit'].shape == torch.Size([4, 64])
compute_critic(inputs)
¶
Overview
Execute parameter updates with compute_critic mode
Use encoded embedding tensor to predict output.
Arguments:
- obs, action encoded tensors.
- mode (:obj:str): Name of the forward mode.
Returns:
- outputs (:obj:Dict): Q-value output.
ReturnKeys:
- q_value (:obj:torch.Tensor): Q value tensor with same size as batch size.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, N1), where B is batch size and N1 is obs_shape
- q_value (:obj:torch.FloatTensor): :math:(B, N2), where B is batch size and N2 is action_shape.
Examples:
>>> inputs =torch.randn(4, N)
>>> model = ACER(obs_shape=(N, ),action_shape=5)
>>> model(inputs, mode='compute_critic')['q_value']
QTran
¶
Bases: Module
Overview
QTRAN network
Interface: init, forward
__init__(agent_num, obs_shape, global_obs_shape, action_shape, hidden_size_list, embedding_size, lstm_type='gru', dueling=False)
¶
Overview
initialize QTRAN network
Arguments:
- agent_num (:obj:int): the number of agent
- obs_shape (:obj:int): the dimension of each agent's observation state
- global_obs_shape (:obj:int): the dimension of global observation state
- action_shape (:obj:int): the dimension of action shape
- hidden_size_list (:obj:list): the list of hidden size
- embedding_size (:obj:int): the dimension of embedding
- lstm_type (:obj:str): use lstm or gru, default to gru
- dueling (:obj:bool): use dueling head or not, default to False.
forward(data, single_step=True)
¶
Overview
forward computation graph of qtran network
Arguments:
- data (:obj:dict): input data dict with keys ['obs', 'prev_state', 'action']
- agent_state (:obj:torch.Tensor): each agent local state(obs)
- global_state (:obj:torch.Tensor): global state(obs)
- prev_state (:obj:list): previous rnn state
- action (:obj:torch.Tensor or None): if action is None, use argmax q_value index as action to calculate agent_q_act
- single_step (:obj:bool): whether single_step forward, if so, add timestep dim before forward and remove it after forward
Return:
- ret (:obj:dict): output data dict with keys ['total_q', 'logit', 'next_state']
- total_q (:obj:torch.Tensor): total q_value, which is the result of mixer network
- agent_q (:obj:torch.Tensor): each agent q_value
- next_state (:obj:list): next rnn state
Shapes:
- agent_state (:obj:torch.Tensor): :math:(T, B, A, N), where T is timestep, B is batch_size A is agent_num, N is obs_shape
- global_state (:obj:torch.Tensor): :math:(T, B, M), where M is global_obs_shape
- prev_state (:obj:list): math:(B, A), a list of length B, and each element is a list of length A
- action (:obj:torch.Tensor): :math:(T, B, A)
- total_q (:obj:torch.Tensor): :math:(T, B)
- agent_q (:obj:torch.Tensor): :math:(T, B, A, P), where P is action_shape
- next_state (:obj:list): math:(B, A), a list of length B, and each element is a list of length A
MAVAC
¶
Bases: Module
Overview
The neural network and computation graph of algorithms related to (state) Value Actor-Critic (VAC) for multi-agent, such as MAPPO(https://arxiv.org/abs/2103.01955). This model now supports discrete and continuous action space. The MAVAC is composed of four parts: actor_encoder, critic_encoder, actor_head and critic_head. Encoders are used to extract the feature from various observation. Heads are used to predict corresponding value or action logit.
Interfaces:
__init__, forward, compute_actor, compute_critic, compute_actor_critic.
__init__(agent_obs_shape, global_obs_shape, action_shape, agent_num, actor_head_hidden_size=256, actor_head_layer_num=2, critic_head_hidden_size=512, critic_head_layer_num=1, action_space='discrete', activation=nn.ReLU(), norm_type=None, sigma_type='independent', bound_type=None, encoder=None)
¶
Overview
Init the MAVAC Model according to arguments.
Arguments:
- agent_obs_shape (:obj:Union[int, SequenceType]): Observation's space for single agent, such as 8 or [4, 84, 84].
- global_obs_shape (:obj:Union[int, SequenceType]): Global observation's space, such as 8 or [4, 84, 84].
- action_shape (:obj:Union[int, SequenceType]): Action space shape for single agent, such as 6 or [2, 3, 3].
- agent_num (:obj:int): This parameter is temporarily reserved. This parameter may be required for subsequent changes to the model
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size of actor_head network, defaults to 256, it must match the last element of agent_obs_shape.
- actor_head_layer_num (:obj:int): The num of layers used in the actor_head network to compute action.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size of critic_head network, defaults to 512, it must match the last element of global_obs_shape.
- critic_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for critic's nn.
- action_space (:obj:Union[int, SequenceType]): The type of different action spaces, including ['discrete', 'continuous'], then will instantiate corresponding head, including DiscreteHead and ReparameterizationHead.
- activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details. you can choose one of ['BN', 'IN', 'SyncBN', 'LN'].
- sigma_type (:obj:Optional[str]): The type of sigma in continuous action space, see ding.torch_utils.network.dreamer.ReparameterizationHead for more details, in MAPPO, it defaults to independent, which means state-independent sigma parameters.
- bound_type (:obj:Optional[str]): The type of action bound methods in continuous action space, defaults to None, which means no bound.
- encoder (:obj:Optional[Tuple[torch.nn.Module, torch.nn.Module]]): The encoder module list, defaults to None, you can define your own actor and critic encoder module and pass it into MAVAC to deal with different observation space.
forward(inputs, mode)
¶
Overview
MAVAC forward computation graph, input observation tensor to predict state value or action logit. mode includes compute_actor, compute_critic, compute_actor_critic.
Different mode will forward with different network modules to get different outputs and save computation.
Arguments:
- inputs (:obj:Dict): The input dict including observation and related info, whose key-values vary from different mode.
- mode (:obj:str): The forward mode, all the modes are defined in the beginning of this class.
Returns:
- outputs (:obj:Dict): The output dict of MAVAC's forward computation graph, whose key-values vary from different mode.
Examples (Actor): >>> model = MAVAC(agent_obs_shape=64, global_obs_shape=128, action_shape=14) >>> inputs = { 'agent_state': torch.randn(10, 8, 64), 'global_state': torch.randn(10, 8, 128), 'action_mask': torch.randint(0, 2, size=(10, 8, 14)) } >>> actor_outputs = model(inputs,'compute_actor') >>> assert actor_outputs['logit'].shape == torch.Size([10, 8, 14])
Examples (Critic): >>> model = MAVAC(agent_obs_shape=64, global_obs_shape=128, action_shape=14) >>> inputs = { 'agent_state': torch.randn(10, 8, 64), 'global_state': torch.randn(10, 8, 128), 'action_mask': torch.randint(0, 2, size=(10, 8, 14)) } >>> critic_outputs = model(inputs,'compute_critic') >>> assert actor_outputs['value'].shape == torch.Size([10, 8])
Examples (Actor-Critic): >>> model = MAVAC(64, 64) >>> inputs = { 'agent_state': torch.randn(10, 8, 64), 'global_state': torch.randn(10, 8, 128), 'action_mask': torch.randint(0, 2, size=(10, 8, 14)) } >>> outputs = model(inputs,'compute_actor_critic') >>> assert outputs['value'].shape == torch.Size([10, 8, 14]) >>> assert outputs['logit'].shape == torch.Size([10, 8])
compute_actor(x)
¶
Overview
MAVAC forward computation graph for actor part, predicting action logit with agent observation tensor in x.
Arguments:
- x (:obj:Dict): Input data dict with keys ['agent_state', 'action_mask'(optional)].
- agent_state: (:obj:torch.Tensor): Each agent local state(obs).
- action_mask(optional): (:obj:torch.Tensor): When action_space is discrete, action_mask needs to be provided to mask illegal actions.
Returns:
- outputs (:obj:Dict): The output dict of the forward computation graph for actor, including logit.
ReturnsKeys:
- logit (:obj:torch.Tensor): The predicted action logit tensor, for discrete action space, it will be the same dimension real-value ranged tensor of possible action choices, and for continuous action space, it will be the mu and sigma of the Gaussian distribution, and the number of mu and sigma is the same as the number of continuous actions.
Shapes:
- logit (:obj:torch.FloatTensor): :math:(B, M, N), where B is batch size and N is action_shape and M is agent_num.
Examples:
>>> model = MAVAC(agent_obs_shape=64, global_obs_shape=128, action_shape=14)
>>> inputs = {
'agent_state': torch.randn(10, 8, 64),
'global_state': torch.randn(10, 8, 128),
'action_mask': torch.randint(0, 2, size=(10, 8, 14))
}
>>> actor_outputs = model(inputs,'compute_actor')
>>> assert actor_outputs['logit'].shape == torch.Size([10, 8, 14])
compute_critic(x)
¶
Overview
MAVAC forward computation graph for critic part. Predict state value with global observation tensor in x.
Arguments:
- x (:obj:Dict): Input data dict with keys ['global_state'].
- global_state: (:obj:torch.Tensor): Global state(obs).
Returns:
- outputs (:obj:Dict): The output dict of MAVAC's forward computation graph for critic, including value.
ReturnsKeys:
- value (:obj:torch.Tensor): The predicted state value tensor.
Shapes:
- value (:obj:torch.FloatTensor): :math:(B, M), where B is batch size and M is agent_num.
Examples:
>>> model = MAVAC(agent_obs_shape=64, global_obs_shape=128, action_shape=14)
>>> inputs = {
'agent_state': torch.randn(10, 8, 64),
'global_state': torch.randn(10, 8, 128),
'action_mask': torch.randint(0, 2, size=(10, 8, 14))
}
>>> critic_outputs = model(inputs,'compute_critic')
>>> assert critic_outputs['value'].shape == torch.Size([10, 8])
compute_actor_critic(x)
¶
Overview
MAVAC forward computation graph for both actor and critic part, input observation to predict action logit and state value.
Arguments:
- x (:obj:Dict): The input dict contains agent_state, global_state and other related info.
Returns:
- outputs (:obj:Dict): The output dict of MAVAC's forward computation graph for both actor and critic, including logit and value.
ReturnsKeys:
- logit (:obj:torch.Tensor): Logit encoding tensor, with same size as input x.
- value (:obj:torch.Tensor): Q value tensor with same size as batch size.
Shapes:
- logit (:obj:torch.FloatTensor): :math:(B, M, N), where B is batch size and N is action_shape and M is agent_num.
- value (:obj:torch.FloatTensor): :math:(B, M), where B is batch sizeand M is agent_num.
Examples:
>>> model = MAVAC(64, 64)
>>> inputs = {
'agent_state': torch.randn(10, 8, 64),
'global_state': torch.randn(10, 8, 128),
'action_mask': torch.randint(0, 2, size=(10, 8, 14))
}
>>> outputs = model(inputs,'compute_actor_critic')
>>> assert outputs['value'].shape == torch.Size([10, 8])
>>> assert outputs['logit'].shape == torch.Size([10, 8, 14])
NGU
¶
Bases: Module
Overview
The recurrent Q model for NGU(https://arxiv.org/pdf/2002.06038.pdf) policy, modified from the class DRQN in q_leaning.py. The implementation mentioned in the original paper is 'adapt the R2D2 agent that uses the dueling network architecture with an LSTM layer after a convolutional neural network'. The NGU network includes encoder, LSTM core(rnn) and head.
Interface:
__init__, forward.
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], collector_env_num=1, dueling=True, head_hidden_size=None, head_layer_num=1, lstm_type='normal', activation=nn.ReLU(), norm_type=None)
¶
Overview
Init the DRQN Model for NGU according to arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's space, such as 8 or [4, 84, 84].
- action_shape (:obj:Union[int, SequenceType]): Action's space, such as 6 or [2, 3, 3].
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder.
- collector_env_num (:obj:Optional[int]): The number of environments used to collect data simultaneously.
- dueling (:obj:bool): Whether choose DuelingHead (True) or DiscreteHead (False), default to True.
- head_hidden_size (:obj:Optional[int]): The hidden_size to pass to Head, should match the last element of encoder_hidden_size_list.
- head_layer_num (:obj:int): The number of layers in head network.
- lstm_type (:obj:Optional[str]): Version of rnn cell, now support ['normal', 'pytorch', 'hpc', 'gru'], default is 'normal'.
- activation (:obj:Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU().
- norm_type (:obj:Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details`.
forward(inputs, inference=False, saved_state_timesteps=None)
¶
Overview
Forward computation graph of NGU R2D2 network. Input observation, prev_action prev_reward_extrinsic to predict NGU Q output. Parameter updates with NGU's MLPs forward setup.
Arguments:
- inputs (:obj:Dict):
- obs (:obj:torch.Tensor): Encoded observation.
- prev_state (:obj:list): Previous state's tensor of size (B, N).
- inference: (:obj:'bool'): If inference is True, we unroll the one timestep transition, if inference is False, we unroll the sequence transitions.
- saved_state_timesteps: (:obj:'Optional[list]'): When inference is False, we unroll the sequence transitions, then we would save rnn hidden states at timesteps that are listed in list saved_state_timesteps.
Returns:
- outputs (:obj:Dict):
Run MLP with DRQN setups and return the result prediction dictionary.
ReturnsKeys
- logit (:obj:
torch.Tensor): Logit tensor with same size as inputobs. - next_state (:obj:
list): Next state's tensor of size(B, N).
Shapes:
- obs (:obj:torch.Tensor): :math:(B, N=obs_space), where B is batch size.
- prev_state(:obj:torch.FloatTensor list): :math:[(B, N)].
- logit (:obj:torch.FloatTensor): :math:(B, N).
- next_state(:obj:torch.FloatTensor list): :math:[(B, N)].
QACDIST
¶
Bases: Module
Overview
The QAC model with distributional Q-value.
Interfaces:
__init__, forward, compute_actor, compute_critic
__init__(obs_shape, action_shape, action_space='regression', critic_head_type='categorical', actor_head_hidden_size=64, actor_head_layer_num=1, critic_head_hidden_size=64, critic_head_layer_num=1, activation=nn.ReLU(), norm_type=None, v_min=-10, v_max=10, n_atom=51)
¶
Overview
Init the QAC Distributional Model according to arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's space.
- action_shape (:obj:Union[int, SequenceType]): Action's space.
- action_space (:obj:str): Whether choose regression or reparameterization.
- critic_head_type (:obj:str): Only categorical.
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor-nn's Head.
- actor_head_layer_num (:obj:int):
The num of layers used in the network to compute Q value output for actor's nn.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic-nn's Head.
- critic_head_layer_num (:obj:int):
The num of layers used in the network to compute Q value output for critic's nn.
- activation (:obj:Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn,
if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details.
- v_min (:obj:int): Value of the smallest atom
- v_max (:obj:int): Value of the largest atom
- n_atom (:obj:int): Number of atoms in the support
forward(inputs, mode)
¶
Overview
Use observation and action tensor to predict output. Parameter updates with QACDIST's MLPs forward setup.
Arguments:
Forward with 'compute_actor':
- inputs (:obj:torch.Tensor):
The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size).
Whether actor_head_hidden_size or critic_head_hidden_size depend on mode.
Forward with ``'compute_critic'``, inputs (`Dict`) Necessary Keys:
- ``obs``, ``action`` encoded tensors.
- mode (:obj:`str`): Name of the forward mode.
Returns:
- outputs (:obj:Dict): Outputs of network forward.
Forward with ``'compute_actor'``, Necessary Keys (either):
- action (:obj:`torch.Tensor`): Action tensor with same size as input ``x``.
- logit (:obj:`torch.Tensor`):
Logit tensor encoding ``mu`` and ``sigma``, both with same size as input ``x``.
Forward with ``'compute_critic'``, Necessary Keys:
- q_value (:obj:`torch.Tensor`): Q value tensor with same size as batch size.
- distribution (:obj:`torch.Tensor`): Q value distribution tensor.
Actor Shapes:
- inputs (:obj:torch.Tensor): :math:(B, N0), B is batch size and N0 corresponds to hidden_size
- action (:obj:torch.Tensor): :math:(B, N0)
- q_value (:obj:torch.FloatTensor): :math:(B, ), where B is batch size.
Critic Shapes
- obs (:obj:
torch.Tensor): :math:(B, N1), where B is batch size and N1 isobs_shape - action (:obj:
torch.Tensor): :math:(B, N2), where B is batch size and N2 isaction_shape - q_value (:obj:
torch.FloatTensor): :math:(B, N2), where B is batch size and N2 isaction_shape - distribution (:obj:
torch.FloatTensor): :math:(B, 1, N3), where B is batch size and N3 isnum_atom
Actor Examples
Regression mode¶
model = QACDIST(64, 64, 'regression') inputs = torch.randn(4, 64) actor_outputs = model(inputs,'compute_actor') assert actor_outputs['action'].shape == torch.Size([4, 64])
Reparameterization Mode¶
model = QACDIST(64, 64, 'reparameterization') inputs = torch.randn(4, 64) actor_outputs = model(inputs,'compute_actor') actor_outputs['logit'][0].shape # mu torch.Size([4, 64]) actor_outputs['logit'][1].shape # sigma torch.Size([4, 64])
Critic Examples
Categorical mode¶
inputs = {'obs': torch.randn(4,N), 'action': torch.randn(4,1)} model = QACDIST(obs_shape=(N, ),action_shape=1,action_space='regression', ... critic_head_type='categorical', n_atoms=51) q_value = model(inputs, mode='compute_critic') # q value assert q_value['q_value'].shape == torch.Size([4, 1]) assert q_value['distribution'].shape == torch.Size([4, 1, 51])
compute_actor(inputs)
¶
Overview
Use encoded embedding tensor to predict output.
Execute parameter updates with 'compute_actor' mode
Use encoded embedding tensor to predict output.
Arguments:
- inputs (:obj:torch.Tensor):
The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size).
hidden_size = actor_head_hidden_size
- mode (:obj:str): Name of the forward mode.
Returns:
- outputs (:obj:Dict): Outputs of forward pass encoder and head.
ReturnsKeys (either):
- action (:obj:torch.Tensor): Continuous action tensor with same size as action_shape.
- logit (:obj:torch.Tensor):
Logit tensor encoding mu and sigma, both with same size as input x.
Shapes:
- inputs (:obj:torch.Tensor): :math:(B, N0), B is batch size and N0 corresponds to hidden_size
- action (:obj:torch.Tensor): :math:(B, N0)
- logit (:obj:list): 2 elements, mu and sigma, each is the shape of :math:(B, N0).
- q_value (:obj:torch.FloatTensor): :math:(B, ), B is batch size.
Examples:
>>> # Regression mode
>>> model = QACDIST(64, 64, 'regression')
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> assert actor_outputs['action'].shape == torch.Size([4, 64])
>>> # Reparameterization Mode
>>> model = QACDIST(64, 64, 'reparameterization')
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> actor_outputs['logit'][0].shape # mu
>>> torch.Size([4, 64])
>>> actor_outputs['logit'][1].shape # sigma
>>> torch.Size([4, 64])
compute_critic(inputs)
¶
Overview
Execute parameter updates with 'compute_critic' mode
Use encoded embedding tensor to predict output.
Arguments:
- obs, action encoded tensors.
- mode (:obj:str): Name of the forward mode.
Returns:
- outputs (:obj:Dict): Q-value output and distribution.
ReturnKeys
- q_value (:obj:
torch.Tensor): Q value tensor with same size as batch size. - distribution (:obj:
torch.Tensor): Q value distribution tensor.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, N1), where B is batch size and N1 is obs_shape
- action (:obj:torch.Tensor): :math:(B, N2), where B is batch size and N2 isaction_shape
- q_value (:obj:torch.FloatTensor): :math:(B, N2), where B is batch size and N2 is action_shape
- distribution (:obj:torch.FloatTensor): :math:(B, 1, N3), where B is batch size and N3 is num_atom
Examples:
>>> # Categorical mode
>>> inputs = {'obs': torch.randn(4,N), 'action': torch.randn(4,1)}
>>> model = QACDIST(obs_shape=(N, ),action_shape=1,action_space='regression', ... critic_head_type='categorical', n_atoms=51)
>>> q_value = model(inputs, mode='compute_critic') # q value
>>> assert q_value['q_value'].shape == torch.Size([4, 1])
>>> assert q_value['distribution'].shape == torch.Size([4, 1, 51])
DiscreteMAQAC
¶
Bases: Module
Overview
The neural network and computation graph of algorithms related to discrete action Multi-Agent Q-value Actor-CritiC (MAQAC) model. The model is composed of actor and critic, where actor is a MLP network and critic is a MLP network. The actor network is used to predict the action probability distribution, and the critic network is used to predict the Q value of the state-action pair.
Interfaces:
__init__, forward, compute_actor, compute_critic
__init__(agent_obs_shape, global_obs_shape, action_shape, twin_critic=False, actor_head_hidden_size=64, actor_head_layer_num=1, critic_head_hidden_size=64, critic_head_layer_num=1, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the DiscreteMAQAC Model according to arguments.
Arguments:
- agent_obs_shape (:obj:Union[int, SequenceType]): Agent's observation's space.
- global_obs_shape (:obj:Union[int, SequenceType]): Global observation's space.
- obs_shape (:obj:Union[int, SequenceType]): Observation's space.
- action_shape (:obj:Union[int, SequenceType]): Action's space.
- twin_critic (:obj:bool): Whether include twin critic.
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor-nn's Head.
- actor_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for actor's nn.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic-nn's Head.
- critic_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for critic's nn.
- activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]): The type of normalization to use, see ding.torch_utils.fc_block for more details.
forward(inputs, mode)
¶
Overview
Use observation tensor to predict output, with compute_actor or compute_critic mode.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- obs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- agent_state (:obj:torch.Tensor): The agent's observation tensor data, with shape :math:(B, A, N0), where B is batch size and A is agent num. N0 corresponds to agent_obs_shape.
- global_state (:obj:torch.Tensor): The global observation tensor data, with shape :math:(B, A, N1), where B is batch size and A is agent num. N1 corresponds to global_obs_shape.
- action_mask (:obj:torch.Tensor): The action mask tensor data, with shape :math:(B, A, N2), where B is batch size and A is agent num. N2 corresponds to action_shape.
- mode (:obj:`str`): The forward mode, all the modes are defined in the beginning of this class.
Returns:
- output (:obj:Dict[str, torch.Tensor]): The output dict of DiscreteMAQAC forward computation graph, whose key-values vary in different forward modes.
Examples:
>>> B = 32
>>> agent_obs_shape = 216
>>> global_obs_shape = 264
>>> agent_num = 8
>>> action_shape = 14
>>> data = {
>>> 'obs': {
>>> 'agent_state': torch.randn(B, agent_num, agent_obs_shape),
>>> 'global_state': torch.randn(B, agent_num, global_obs_shape),
>>> 'action_mask': torch.randint(0, 2, size=(B, agent_num, action_shape))
>>> }
>>> }
>>> model = DiscreteMAQAC(agent_obs_shape, global_obs_shape, action_shape, twin_critic=True)
>>> logit = model(data, mode='compute_actor')['logit']
>>> value = model(data, mode='compute_critic')['q_value']
compute_actor(inputs)
¶
Overview
Use observation tensor to predict action logits.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- obs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- agent_state (:obj:torch.Tensor): The agent's observation tensor data, with shape :math:(B, A, N0), where B is batch size and A is agent num. N0 corresponds to agent_obs_shape.
- global_state (:obj:torch.Tensor): The global observation tensor data, with shape :math:(B, A, N1), where B is batch size and A is agent num. N1 corresponds to global_obs_shape.
- action_mask (:obj:torch.Tensor): The action mask tensor data, with shape :math:(B, A, N2), where B is batch size and A is agent num. N2 corresponds to action_shape.
Returns:
- output (:obj:Dict[str, torch.Tensor]): The output dict of DiscreteMAQAC forward computation graph, whose key-values vary in different forward modes.
- logit (:obj:torch.Tensor): Action's output logit (real value range), whose shape is :math:(B, A, N2), where N2 corresponds to action_shape.
- action_mask (:obj:torch.Tensor): Action mask tensor with same size as action_shape.
Examples:
>>> B = 32
>>> agent_obs_shape = 216
>>> global_obs_shape = 264
>>> agent_num = 8
>>> action_shape = 14
>>> data = {
>>> 'obs': {
>>> 'agent_state': torch.randn(B, agent_num, agent_obs_shape),
>>> 'global_state': torch.randn(B, agent_num, global_obs_shape),
>>> 'action_mask': torch.randint(0, 2, size=(B, agent_num, action_shape))
>>> }
>>> }
>>> model = DiscreteMAQAC(agent_obs_shape, global_obs_shape, action_shape, twin_critic=True)
>>> logit = model.compute_actor(data)['logit']
compute_critic(inputs)
¶
Overview
use observation tensor to predict Q value.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- obs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- agent_state (:obj:torch.Tensor): The agent's observation tensor data, with shape :math:(B, A, N0), where B is batch size and A is agent num. N0 corresponds to agent_obs_shape.
- global_state (:obj:torch.Tensor): The global observation tensor data, with shape :math:(B, A, N1), where B is batch size and A is agent num. N1 corresponds to global_obs_shape.
- action_mask (:obj:torch.Tensor): The action mask tensor data, with shape :math:(B, A, N2), where B is batch size and A is agent num. N2 corresponds to action_shape.
Returns:
- output (:obj:Dict[str, torch.Tensor]): The output dict of DiscreteMAQAC forward computation graph, whose key-values vary in different values of twin_critic.
- q_value (:obj:list): If twin_critic=True, q_value should be 2 elements, each is the shape of :math:(B, A, N2), where B is batch size and A is agent num. N2 corresponds to action_shape. Otherwise, q_value should be torch.Tensor.
Examples:
>>> B = 32
>>> agent_obs_shape = 216
>>> global_obs_shape = 264
>>> agent_num = 8
>>> action_shape = 14
>>> data = {
>>> 'obs': {
>>> 'agent_state': torch.randn(B, agent_num, agent_obs_shape),
>>> 'global_state': torch.randn(B, agent_num, global_obs_shape),
>>> 'action_mask': torch.randint(0, 2, size=(B, agent_num, action_shape))
>>> }
>>> }
>>> model = DiscreteMAQAC(agent_obs_shape, global_obs_shape, action_shape, twin_critic=True)
>>> value = model.compute_critic(data)['q_value']
ContinuousMAQAC
¶
Bases: Module
Overview
The neural network and computation graph of algorithms related to continuous action Multi-Agent Q-value Actor-CritiC (MAQAC) model. The model is composed of actor and critic, where actor is a MLP network and critic is a MLP network. The actor network is used to predict the action probability distribution, and the critic network is used to predict the Q value of the state-action pair.
Interfaces:
__init__, forward, compute_actor, compute_critic
__init__(agent_obs_shape, global_obs_shape, action_shape, action_space, twin_critic=False, actor_head_hidden_size=64, actor_head_layer_num=1, critic_head_hidden_size=64, critic_head_layer_num=1, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the QAC Model according to arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's space.
- action_shape (:obj:Union[int, SequenceType, EasyDict]): Action's space, such as 4, (3, )
- action_space (:obj:str): Whether choose regression or reparameterization.
- twin_critic (:obj:bool): Whether include twin critic.
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor-nn's Head.
- actor_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for actor's nn.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic-nn's Head.
- critic_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for critic's nn.
- activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]): The type of normalization to use, see ding.torch_utils.fc_block for more details.
forward(inputs, mode)
¶
Overview
Use observation and action tensor to predict output in compute_actor or compute_critic mode.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- obs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- agent_state (:obj:torch.Tensor): The agent's observation tensor data, with shape :math:(B, A, N0), where B is batch size and A is agent num. N0 corresponds to agent_obs_shape.
- global_state (:obj:torch.Tensor): The global observation tensor data, with shape :math:(B, A, N1), where B is batch size and A is agent num. N1 corresponds to global_obs_shape.
- action_mask (:obj:torch.Tensor): The action mask tensor data, with shape :math:(B, A, N2), where B is batch size and A is agent num. N2 corresponds to action_shape.
- ``action`` (:obj:`torch.Tensor`): The action tensor data, with shape :math:`(B, A, N3)`, where B is batch size and A is agent num. N3 corresponds to ``action_shape``.
- mode (:obj:`str`): Name of the forward mode.
Returns:
- outputs (:obj:Dict): Outputs of network forward, whose key-values will be different for different mode, twin_critic, action_space.
Examples:
>>> B = 32
>>> agent_obs_shape = 216
>>> global_obs_shape = 264
>>> agent_num = 8
>>> action_shape = 14
>>> act_space = 'reparameterization' # regression
>>> data = {
>>> 'obs': {
>>> 'agent_state': torch.randn(B, agent_num, agent_obs_shape),
>>> 'global_state': torch.randn(B, agent_num, global_obs_shape),
>>> 'action_mask': torch.randint(0, 2, size=(B, agent_num, action_shape))
>>> },
>>> 'action': torch.randn(B, agent_num, squeeze(action_shape))
>>> }
>>> model = ContinuousMAQAC(agent_obs_shape, global_obs_shape, action_shape, act_space, twin_critic=False)
>>> if action_space == 'regression':
>>> action = model(data['obs'], mode='compute_actor')['action']
>>> elif action_space == 'reparameterization':
>>> (mu, sigma) = model(data['obs'], mode='compute_actor')['logit']
>>> value = model(data, mode='compute_critic')['q_value']
compute_actor(inputs)
¶
Overview
Use observation tensor to predict action logits.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- agent_state (:obj:torch.Tensor): The agent's observation tensor data, with shape :math:(B, A, N0), where B is batch size and A is agent num. N0 corresponds to agent_obs_shape.
Returns:
| Type | Description |
|---|---|
Dict
|
|
ReturnKeys (action_space == 'regression'):
- action (:obj:torch.Tensor): Action tensor with same size as action_shape.
ReturnKeys (action_space == 'reparameterization'):
- logit (:obj:list): 2 elements, each is the shape of :math:(B, A, N3), where B is batch size and A is agent num. N3 corresponds to action_shape.
Examples:
>>> B = 32
>>> agent_obs_shape = 216
>>> global_obs_shape = 264
>>> agent_num = 8
>>> action_shape = 14
>>> act_space = 'reparameterization' # 'regression'
>>> data = {
>>> 'agent_state': torch.randn(B, agent_num, agent_obs_shape),
>>> }
>>> model = ContinuousMAQAC(agent_obs_shape, global_obs_shape, action_shape, act_space, twin_critic=False)
>>> if action_space == 'regression':
>>> action = model.compute_actor(data)['action']
>>> elif action_space == 'reparameterization':
>>> (mu, sigma) = model.compute_actor(data)['logit']
compute_critic(inputs)
¶
Overview
Use observation tensor and action tensor to predict Q value.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- obs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- agent_state (:obj:torch.Tensor): The agent's observation tensor data, with shape :math:(B, A, N0), where B is batch size and A is agent num. N0 corresponds to agent_obs_shape.
- global_state (:obj:torch.Tensor): The global observation tensor data, with shape :math:(B, A, N1), where B is batch size and A is agent num. N1 corresponds to global_obs_shape.
- action_mask (:obj:torch.Tensor): The action mask tensor data, with shape :math:(B, A, N2), where B is batch size and A is agent num. N2 corresponds to action_shape.
- ``action`` (:obj:`torch.Tensor`): The action tensor data, with shape :math:`(B, A, N3)`, where B is batch size and A is agent num. N3 corresponds to ``action_shape``.
Returns:
| Type | Description |
|---|---|
Dict
|
|
ReturnKeys (twin_critic=True):
- q_value (:obj:list): 2 elements, each is the shape of :math:(B, A), where B is batch size and A is agent num.
ReturnKeys (twin_critic=False):
- q_value (:obj:torch.Tensor): :math:(B, A), where B is batch size and A is agent num.
Examples:
>>> B = 32
>>> agent_obs_shape = 216
>>> global_obs_shape = 264
>>> agent_num = 8
>>> action_shape = 14
>>> act_space = 'reparameterization' # 'regression'
>>> data = {
>>> 'obs': {
>>> 'agent_state': torch.randn(B, agent_num, agent_obs_shape),
>>> 'global_state': torch.randn(B, agent_num, global_obs_shape),
>>> 'action_mask': torch.randint(0, 2, size=(B, agent_num, action_shape))
>>> },
>>> 'action': torch.randn(B, agent_num, squeeze(action_shape))
>>> }
>>> model = ContinuousMAQAC(agent_obs_shape, global_obs_shape, action_shape, act_space, twin_critic=False)
>>> value = model.compute_critic(data)['q_value']
VanillaVAE
¶
Bases: Module
Overview
Implementation of Vanilla variational autoencoder for action reconstruction.
Interfaces:
__init__, encode, decode, decode_with_obs, reparameterize, forward, loss_function .
encode(input)
¶
Overview
Encodes the input by passing through the encoder network and returns the latent codes.
Arguments:
- input (:obj:Dict): Dict containing keywords obs (:obj:torch.Tensor) and action (:obj:torch.Tensor), representing the observation and agent's action respectively.
Returns:
- outputs (:obj:Dict): Dict containing keywords mu (:obj:torch.Tensor), log_var (:obj:torch.Tensor) and obs_encoding (:obj:torch.Tensor) representing latent codes.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, O), where B is batch size and O is observation dim.
- action (:obj:torch.Tensor): :math:(B, A), where B is batch size and A is action dim.
- mu (:obj:torch.Tensor): :math:(B, L), where B is batch size and L is latent size.
- log_var (:obj:torch.Tensor): :math:(B, L), where B is batch size and L is latent size.
- obs_encoding (:obj:torch.Tensor): :math:(B, H), where B is batch size and H is hidden dim.
decode(z, obs_encoding)
¶
Overview
Maps the given latent action and obs_encoding onto the original action space.
Arguments:
- z (:obj:torch.Tensor): the sampled latent action
- obs_encoding (:obj:torch.Tensor): observation encoding
Returns:
- outputs (:obj:Dict): DQN forward outputs, such as q_value.
ReturnsKeys:
- reconstruction_action (:obj:torch.Tensor): reconstruction_action.
- predition_residual (:obj:torch.Tensor): predition_residual.
Shapes:
- z (:obj:torch.Tensor): :math:(B, L), where B is batch size and L is latent_size
- obs_encoding (:obj:torch.Tensor): :math:(B, H), where B is batch size and H is hidden dim
decode_with_obs(z, obs)
¶
Overview
Maps the given latent action and obs onto the original action space. Using the method self.encode_obs_head(obs) to get the obs_encoding.
Arguments:
- z (:obj:torch.Tensor): the sampled latent action
- obs (:obj:torch.Tensor): observation
Returns:
- outputs (:obj:Dict): DQN forward outputs, such as q_value.
ReturnsKeys:
- reconstruction_action (:obj:torch.Tensor): the action reconstructed by VAE .
- predition_residual (:obj:torch.Tensor): the observation predicted by VAE.
Shapes:
- z (:obj:torch.Tensor): :math:(B, L), where B is batch size and L is latent_size
- obs (:obj:torch.Tensor): :math:(B, O), where B is batch size and O is obs_shape
reparameterize(mu, logvar)
¶
Overview
Reparameterization trick to sample from N(mu, var) from N(0,1).
Arguments:
- mu (:obj:torch.Tensor): Mean of the latent Gaussian
- logvar (:obj:torch.Tensor): Standard deviation of the latent Gaussian
Shapes:
- mu (:obj:torch.Tensor): :math:(B, L), where B is batch size and L is latnet_size
- logvar (:obj:torch.Tensor): :math:(B, L), where B is batch size and L is latnet_size
forward(input, **kwargs)
¶
Overview
Encode the input, reparameterize mu and log_var, decode obs_encoding.
Argumens:
- input (:obj:Dict): Dict containing keywords obs (:obj:torch.Tensor) and action (:obj:torch.Tensor), representing the observation and agent's action respectively.
Returns:
- outputs (:obj:Dict): Dict containing keywords recons_action (:obj:torch.Tensor), prediction_residual (:obj:torch.Tensor), input (:obj:torch.Tensor), mu (:obj:torch.Tensor), log_var (:obj:torch.Tensor) and z (:obj:torch.Tensor).
Shapes:
- recons_action (:obj:torch.Tensor): :math:(B, A), where B is batch size and A is action dim.
- prediction_residual (:obj:torch.Tensor): :math:(B, O), where B is batch size and O is observation dim.
- mu (:obj:torch.Tensor): :math:(B, L), where B is batch size and L is latent size.
- log_var (:obj:torch.Tensor): :math:(B, L), where B is batch size and L is latent size.
- z (:obj:torch.Tensor): :math:(B, L), where B is batch size and L is latent_size
loss_function(args, **kwargs)
¶
Overview
Computes the VAE loss function.
Arguments:
- args (:obj:Dict[str, Tensor]): Dict containing keywords recons_action, prediction_residual original_action, mu, log_var and true_residual.
- kwargs (:obj:Dict): Dict containing keywords kld_weight and predict_weight.
Returns:
- outputs (:obj:Dict[str, Tensor]): Dict containing different loss results, including loss, reconstruction_loss, kld_loss, predict_loss.
Shapes:
- recons_action (:obj:torch.Tensor): :math:(B, A), where B is batch size and A is action dim.
- prediction_residual (:obj:torch.Tensor): :math:(B, O), where B is batch size and O is observation dim.
- original_action (:obj:torch.Tensor): :math:(B, A), where B is batch size and A is action dim.
- mu (:obj:torch.Tensor): :math:(B, L), where B is batch size and L is latent size.
- log_var (:obj:torch.Tensor): :math:(B, L), where B is batch size and L is latent size.
- true_residual (:obj:torch.Tensor): :math:(B, O), where B is batch size and O is observation dim.
DecisionTransformer
¶
Bases: Module
Overview
The implementation of decision transformer.
Interfaces:
__init__, forward, configure_optimizers
__init__(state_dim, act_dim, n_blocks, h_dim, context_len, n_heads, drop_p, max_timestep=4096, state_encoder=None, continuous=False)
¶
Overview
Initialize the DecisionTransformer Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Dimension of state, such as 128 or (4, 84, 84).
- act_dim (:obj:int): The dimension of actions, such as 6.
- n_blocks (:obj:int): The number of transformer blocks in the decision transformer, such as 3.
- h_dim (:obj:int): The dimension of the hidden layers, such as 128.
- context_len (:obj:int): The max context length of the attention, such as 6.
- n_heads (:obj:int): The number of heads in calculating attention, such as 8.
- drop_p (:obj:float): The drop rate of the drop-out layer, such as 0.1.
- max_timestep (:obj:int): The max length of the total sequence, defaults to be 4096.
- state_encoder (:obj:Optional[nn.Module]): The encoder to pre-process the given input. If it is set to None, the raw state will be pushed into the transformer.
- continuous (:obj:bool): Whether the action space is continuous, defaults to be False.
forward(timesteps, states, actions, returns_to_go, tar=None)
¶
Overview
Forward computation graph of the decision transformer, input a sequence tensor and return a tensor with the same shape.
Arguments:
- timesteps (:obj:torch.Tensor): The timestep for input sequence.
- states (:obj:torch.Tensor): The sequence of states.
- actions (:obj:torch.Tensor): The sequence of actions.
- returns_to_go (:obj:torch.Tensor): The sequence of return-to-go.
- tar (:obj:Optional[int]): Whether to predict action, regardless of index.
Returns:
- output (:obj:Tuple[torch.Tensor, torch.Tensor, torch.Tensor]): Output contains three tensors, they are correspondingly the predicted states, predicted actions and predicted return-to-go.
Examples:
>>> B, T = 4, 6
>>> state_dim = 3
>>> act_dim = 2
>>> DT_model = DecisionTransformer( state_dim=state_dim, act_dim=act_dim, n_blocks=3, h_dim=8, context_len=T, n_heads=2, drop_p=0.1, )
>>> timesteps = torch.randint(0, 100, [B, 3 * T - 1, 1], dtype=torch.long) # B x T
>>> states = torch.randn([B, T, state_dim]) # B x T x state_dim
>>> actions = torch.randint(0, act_dim, [B, T, 1])
>>> action_target = torch.randint(0, act_dim, [B, T, 1])
>>> returns_to_go_sample = torch.tensor([1, 0.8, 0.6, 0.4, 0.2, 0.]).repeat([B, 1]).unsqueeze(-1).float()
>>> traj_mask = torch.ones([B, T], dtype=torch.long) # B x T
>>> actions = actions.squeeze(-1)
>>> state_preds, action_preds, return_preds = DT_model.forward( timesteps=timesteps, states=states, actions=actions, returns_to_go=returns_to_go )
>>> assert state_preds.shape == torch.Size([B, T, state_dim])
>>> assert return_preds.shape == torch.Size([B, T, 1])
>>> assert action_preds.shape == torch.Size([B, T, act_dim])
configure_optimizers(weight_decay, learning_rate, betas=(0.9, 0.95))
¶
Overview
This function returns an optimizer given the input arguments. We are separating out all parameters of the model into two buckets: those that will experience weight decay for regularization and those that won't (biases, and layernorm/embedding weights).
Arguments:
- weight_decay (:obj:float): The weigh decay of the optimizer.
- learning_rate (:obj:float): The learning rate of the optimizer.
- betas (:obj:Tuple[float, float]): The betas for Adam optimizer.
Outputs:
- optimizer (:obj:torch.optim.Optimizer): The desired optimizer.
ProcedureCloningMCTS
¶
Bases: Module
Overview
The neural network of algorithms related to Procedure cloning (PC).
Interfaces:
__init__, forward.
__init__(obs_shape, action_dim, cnn_hidden_list=[128, 128, 256, 256, 256], cnn_activation=nn.ReLU(), cnn_kernel_size=[3, 3, 3, 3, 3], cnn_stride=[1, 1, 1, 1, 1], cnn_padding=[1, 1, 1, 1, 1], mlp_hidden_list=[256, 256], mlp_activation=nn.ReLU(), att_heads=8, att_hidden=128, n_att=4, n_feedforward=2, feedforward_hidden=256, drop_p=0.5, max_T=17)
¶
Overview
Initialize the MCTS procedure cloning model according to corresponding input arguments.
Arguments:
- obs_shape (:obj:SequenceType): Observation space shape, such as [4, 84, 84].
- action_dim (:obj:int): Action space shape, such as 6.
- cnn_hidden_list (:obj:SequenceType): The cnn channel dims for each block, such as [128, 128, 256, 256, 256].
- cnn_activation (:obj:nn.Module): The activation function for cnn blocks, such as nn.ReLU().
- cnn_kernel_size (:obj:SequenceType): The kernel size for each cnn block, such as [3, 3, 3, 3, 3].
- cnn_stride (:obj:SequenceType): The stride for each cnn block, such as [1, 1, 1, 1, 1].
- cnn_padding (:obj:SequenceType): The padding for each cnn block, such as [1, 1, 1, 1, 1].
- mlp_hidden_list (:obj:SequenceType): The last dim for this must match the last dim of cnn_hidden_list, such as [256, 256].
- mlp_activation (:obj:nn.Module): The activation function for mlp layers, such as nn.ReLU().
- att_heads (:obj:int): The number of attention heads in transformer, such as 8.
- att_hidden (:obj:int): The number of attention dimension in transformer, such as 128.
- n_att (:obj:int): The number of attention blocks in transformer, such as 4.
- n_feedforward (:obj:int): The number of feedforward layers in transformer, such as 2.
- drop_p (:obj:float): The drop out rate of attention, such as 0.5.
- max_T (:obj:int): The sequence length of procedure cloning, such as 17.
forward(states, goals, actions)
¶
Overview
ProcedureCloningMCTS forward computation graph, input states tensor and goals tensor, calculate the predicted states and actions.
Arguments:
- states (:obj:torch.Tensor): The observation of current time.
- goals (:obj:torch.Tensor): The target observation after a period.
- actions (:obj:torch.Tensor): The actions executed during the period.
Returns:
- outputs (:obj:Tuple[torch.Tensor, torch.Tensor]): Predicted states and actions.
Examples:
>>> inputs = { 'states': torch.randn(2, 3, 64, 64), 'goals': torch.randn(2, 3, 64, 64), 'actions': torch.randn(2, 15, 9) }
>>> model = ProcedureCloningMCTS(obs_shape=(3, 64, 64), action_dim=9)
>>> goal_preds, action_preds = model(inputs['states'], inputs['goals'], inputs['actions'])
>>> assert goal_preds.shape == (2, 256)
>>> assert action_preds.shape == (2, 16, 9)
ProcedureCloningBFS
¶
Bases: Module
Overview
The neural network introduced in procedure cloning (PC) to process 3-dim observations. Given an input, this model will perform several 3x3 convolutions and output a feature map with the same height and width of input. The channel number of output will be the action_shape.
Interfaces:
__init__, forward.
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 256, 256])
¶
Overview
Init the BFSConvolution Encoder according to the provided arguments.
Arguments:
- obs_shape (:obj:SequenceType): Sequence of in_channel, plus one or more input size, such as [4, 84, 84].
- action_dim (:obj:int): Action space shape, such as 6.
- cnn_hidden_list (:obj:SequenceType): The cnn channel dims for each block, such as [128, 128, 256, 256].
forward(x)
¶
Overview
The computation graph. Given a 3-dim observation, this function will return a tensor with the same height and width. The channel number of output will be the action_shape.
Arguments:
- x (:obj:torch.Tensor): The input observation tensor data.
Returns:
- outputs (:obj:Dict): The output dict of model's forward computation graph, only contains a single key logit.
Examples:
>>> model = ProcedureCloningBFS([3, 16, 16], 4)
>>> inputs = torch.randn(16, 16, 3).unsqueeze(0)
>>> outputs = model(inputs)
>>> assert outputs['logit'].shape == torch.Size([16, 16, 4])
BCQ
¶
Bases: Module
Overview
Model of BCQ (Batch-Constrained deep Q-learning). Off-Policy Deep Reinforcement Learning without Exploration. https://arxiv.org/abs/1812.02900
Interface:
forward, compute_actor, compute_critic, compute_vae, compute_eval
Property:
mode
__init__(obs_shape, action_shape, actor_head_hidden_size=[400, 300], critic_head_hidden_size=[400, 300], activation=nn.ReLU(), vae_hidden_dims=[750, 750], phi=0.05)
¶
Overview
Initialize neural network, i.e. agent Q network and actor.
Arguments:
- obs_shape (:obj:int): the dimension of observation state
- action_shape (:obj:int): the dimension of action shape
- actor_hidden_size (:obj:list): the list of hidden size of actor
- critic_hidden_size (:obj:'list'): the list of hidden size of critic
- activation (:obj:nn.Module): Activation function in network, defaults to nn.ReLU().
- vae_hidden_dims (:obj:list): the list of hidden size of vae
forward(inputs, mode)
¶
Overview
The unique execution (forward) method of BCQ method, and one can indicate different modes to implement different computation graph, including compute_actor and compute_critic in BCQ.
Mode compute_actor:
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- output (:obj:Dict): Output dict data, including action tensor.
Mode compute_critic:
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- output (:obj:Dict): Output dict data, including q_value tensor.
Mode compute_vae:
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- outputs (:obj:Dict): Dict containing keywords recons_action (:obj:torch.Tensor), prediction_residual (:obj:torch.Tensor), input (:obj:torch.Tensor), mu (:obj:torch.Tensor), log_var (:obj:torch.Tensor) and z (:obj:torch.Tensor).
Mode compute_eval:
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- output (:obj:Dict): Output dict data, including action tensor.
Examples:
>>> inputs = {'obs': torch.randn(4, 32), 'action': torch.randn(4, 6)}
>>> model = BCQ(32, 6)
>>> outputs = model(inputs, mode='compute_actor')
>>> outputs = model(inputs, mode='compute_critic')
>>> outputs = model(inputs, mode='compute_vae')
>>> outputs = model(inputs, mode='compute_eval')
.. note::
For specific examples, one can refer to API doc of compute_actor and compute_critic respectively.
compute_critic(inputs)
¶
Overview
Use critic network to compute q value.
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- outputs (:obj:Dict): Dict containing keywords q_value (:obj:torch.Tensor).
Shapes:
- inputs (:obj:Dict): :math:(B, N, D), where B is batch size, N is sample number, D is input dimension.
- outputs (:obj:Dict): :math:(B, N).
Examples:
>>> inputs = {'obs': torch.randn(4, 32), 'action': torch.randn(4, 6)}
>>> model = BCQ(32, 6)
>>> outputs = model.compute_critic(inputs)
compute_actor(inputs)
¶
Overview
Use actor network to compute action.
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- outputs (:obj:Dict): Dict containing keywords action (:obj:torch.Tensor).
Shapes:
- inputs (:obj:Dict): :math:(B, N, D), where B is batch size, N is sample number, D is input dimension.
- outputs (:obj:Dict): :math:(B, N).
Examples:
>>> inputs = {'obs': torch.randn(4, 32), 'action': torch.randn(4, 6)}
>>> model = BCQ(32, 6)
>>> outputs = model.compute_actor(inputs)
compute_vae(inputs)
¶
Overview
Use vae network to compute action.
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- outputs (:obj:Dict): Dict containing keywords recons_action (:obj:torch.Tensor), prediction_residual (:obj:torch.Tensor), input (:obj:torch.Tensor), mu (:obj:torch.Tensor), log_var (:obj:torch.Tensor) and z (:obj:torch.Tensor).
Shapes:
- inputs (:obj:Dict): :math:(B, N, D), where B is batch size, N is sample number, D is input dimension.
- outputs (:obj:Dict): :math:(B, N).
Examples:
>>> inputs = {'obs': torch.randn(4, 32), 'action': torch.randn(4, 6)}
>>> model = BCQ(32, 6)
>>> outputs = model.compute_vae(inputs)
compute_eval(inputs)
¶
Overview
Use actor network to compute action.
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- outputs (:obj:Dict): Dict containing keywords action (:obj:torch.Tensor).
Shapes:
- inputs (:obj:Dict): :math:(B, N, D), where B is batch size, N is sample number, D is input dimension.
- outputs (:obj:Dict): :math:(B, N).
Examples:
>>> inputs = {'obs': torch.randn(4, 32), 'action': torch.randn(4, 6)}
>>> model = BCQ(32, 6)
>>> outputs = model.compute_eval(inputs)
EDAC
¶
Bases: Module
Overview
The Q-value Actor-Critic network with the ensemble mechanism, which is used in EDAC.
Interfaces:
__init__, forward, compute_actor, compute_critic
__init__(obs_shape, action_shape, ensemble_num=2, actor_head_hidden_size=64, actor_head_layer_num=1, critic_head_hidden_size=64, critic_head_layer_num=1, activation=nn.ReLU(), norm_type=None, **kwargs)
¶
Overview
Initailize the EDAC Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's shape, such as 128, (156, ).
- action_shape (:obj:Union[int, SequenceType, EasyDict]): Action's shape, such as 4, (3, ), EasyDict({'action_type_shape': 3, 'action_args_shape': 4}).
- ensemble_num (:obj:int): Q-net number.
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor head.
- actor_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for actor head.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic head.
- critic_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for critic head.
- activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP after each FC layer, if None then default set to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization to after network layer (FC, Conv), see ding.torch_utils.network for more details.
forward(inputs, mode)
¶
Overview
The unique execution (forward) method of EDAC method, and one can indicate different modes to implement different computation graph, including compute_actor and compute_critic in EDAC.
Mode compute_actor:
Arguments:
- inputs (:obj:torch.Tensor): Observation data, defaults to tensor.
Returns:
- output (:obj:Dict): Output dict data, including differnet key-values among distinct action_space.
Mode compute_critic:
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- output (:obj:Dict): Output dict data, including q_value tensor.
.. note::
For specific examples, one can refer to API doc of compute_actor and compute_critic respectively.
compute_actor(obs)
¶
Overview
The forward computation graph of compute_actor mode, uses observation tensor to produce actor output,
such as action, logit and so on.
Arguments:
- obs (:obj:torch.Tensor): Observation tensor data, now supports a batch of 1-dim vector data, i.e. (B, obs_shape).
Returns:
- outputs (:obj:Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]]): Actor output varying from action_space: reparameterization.
ReturnsKeys (either):
- logit (:obj:Dict[str, torch.Tensor]): Reparameterization logit, usually in SAC.
- mu (:obj:torch.Tensor): Mean of parameterization gaussion distribution.
- sigma (:obj:torch.Tensor): Standard variation of parameterization gaussion distribution.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, N0), B is batch size and N0 corresponds to obs_shape.
- action (:obj:torch.Tensor): :math:(B, N1), B is batch size and N1 corresponds to action_shape.
- logit.mu (:obj:torch.Tensor): :math:(B, N1), B is batch size and N1 corresponds to action_shape.
- logit.sigma (:obj:torch.Tensor): :math:(B, N1), B is batch size.
- logit (:obj:torch.Tensor): :math:(B, N2), B is batch size and N2 corresponds to action_shape.action_type_shape.
- action_args (:obj:torch.Tensor): :math:(B, N3), B is batch size and N3 corresponds to action_shape.action_args_shape.
Examples:
>>> model = EDAC(64, 64,)
>>> obs = torch.randn(4, 64)
>>> actor_outputs = model(obs,'compute_actor')
>>> assert actor_outputs['logit'][0].shape == torch.Size([4, 64]) # mu
>>> actor_outputs['logit'][1].shape == torch.Size([4, 64]) # sigma
compute_critic(inputs)
¶
Overview
The forward computation graph of compute_critic mode, uses observation and action tensor to produce critic
output, such as q_value.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): Dict strcture of input data, including obs and action tensor
Returns:
- outputs (:obj:Dict[str, torch.Tensor]): Critic output, such as q_value.
ArgumentsKeys:
- obs: (:obj:torch.Tensor): Observation tensor data, now supports a batch of 1-dim vector data.
- action (:obj:Union[torch.Tensor, Dict]): Continuous action with same size as action_shape.
ReturnKeys:
- q_value (:obj:torch.Tensor): Q value tensor with same size as batch size.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, N1) or '(Ensemble_num, B, N1)', where B is batch size and N1 is obs_shape.
- action (:obj:torch.Tensor): :math:(B, N2) or '(Ensemble_num, B, N2)', where B is batch size and N4 is action_shape.
- q_value (:obj:torch.Tensor): :math:(Ensemble_num, B), where B is batch size.
Examples:
>>> inputs = {'obs': torch.randn(4, 8), 'action': torch.randn(4, 1)}
>>> model = EDAC(obs_shape=(8, ),action_shape=1)
>>> model(inputs, mode='compute_critic')['q_value'] # q value
... tensor([0.0773, 0.1639, 0.0917, 0.0370], grad_fn=
HPT
¶
Bases: Module
Overview
The HPT model for reinforcement learning, which consists of a Policy Stem and a Dueling Head. The Policy Stem utilizes cross-attention to process input data, and the Dueling Head computes Q-values for discrete action spaces.
Interfaces
__init__, forward
GitHub: [https://github.com/liruiw/HPT/blob/main/hpt/models/policy_stem.py]
__init__(state_dim, action_dim)
¶
Overview
Initialize the HPT model, including the Policy Stem and the Dueling Head.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
- state_dim (
|
obj: |
required | |
- action_dim (
|
obj: |
required |
.. note:: The Policy Stem is initialized with cross-attention, and the Dueling Head is set to process the resulting tokens.
forward(x)
¶
Overview
Forward pass of the HPT model. Computes latent tokens from the input state and passes them through the Dueling Head.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
- x (
|
obj: |
required |
Returns:
| Type | Description |
|---|---|
|
QGPO
¶
Bases: Module
Overview
Model of QGPO algorithm.
Interfaces:
__init__, calculateQ, select_actions, sample, score_model_loss_fn, q_loss_fn, qt_loss_fn
__init__(cfg)
¶
Overview
Initialization of QGPO.
Arguments:
- cfg (:obj:EasyDict): The config dict.
calculateQ(s, a)
¶
Overview
Calculate the Q value.
Arguments:
- s (:obj:torch.Tensor): The input state.
- a (:obj:torch.Tensor): The input action.
select_actions(states, diffusion_steps=15, guidance_scale=1.0)
¶
Overview
Select actions for conditional sampling.
Arguments:
- states (:obj:list): The input states.
- diffusion_steps (:obj:int): The diffusion steps.
- guidance_scale (:obj:float): The scale of guidance.
sample(states, sample_per_state=16, diffusion_steps=15, guidance_scale=1.0)
¶
Overview
Sample actions for conditional sampling.
Arguments:
- states (:obj:list): The input states.
- sample_per_state (:obj:int): The number of samples per state.
- diffusion_steps (:obj:int): The diffusion steps.
- guidance_scale (:obj:float): The scale of guidance.
score_model_loss_fn(x, s, eps=0.001)
¶
Overview
The loss function for training score-based generative models.
Arguments: model: A PyTorch model instance that represents a time-dependent score-based model. x: A mini-batch of training data. eps: A tolerance value for numerical stability.
q_loss_fn(a, s, r, s_, d, fake_a_, discount=0.99)
¶
Overview
The loss function for training Q function.
Arguments:
- a (:obj:torch.Tensor): The input action.
- s (:obj:torch.Tensor): The input state.
- r (:obj:torch.Tensor): The input reward.
- s_ (:obj:torch.Tensor): The input next state.
- d (:obj:torch.Tensor): The input done.
- fake_a (:obj:torch.Tensor): The input fake action.
- discount (:obj:float): The discount factor.
qt_loss_fn(s, fake_a)
¶
Overview
The loss function for training Guidance Qt.
Arguments:
- s (:obj:torch.Tensor): The input state.
- fake_a (:obj:torch.Tensor): The input fake action.
EBM
¶
Bases: Module
Overview
Energy based model.
Interface:
__init__, forward
__init__(obs_shape, action_shape, hidden_size=512, hidden_layer_num=4, **kwargs)
¶
Overview
Initialize the EBM.
Arguments:
- obs_shape (:obj:int): Observation shape.
- action_shape (:obj:int): Action shape.
- hidden_size (:obj:int): Hidden size.
- hidden_layer_num (:obj:int): Number of hidden layers.
forward(obs, action)
¶
Overview
Forward computation graph of EBM.
Arguments:
- obs (:obj:torch.Tensor): Observation of shape (B, N, O).
- action (:obj:torch.Tensor): Action of shape (B, N, A).
Returns:
- pred (:obj:torch.Tensor): Energy of shape (B, N).
Examples:
>>> obs = torch.randn(2, 3, 4)
>>> action = torch.randn(2, 3, 5)
>>> ebm = EBM(4, 5)
>>> pred = ebm(obs, action)
AutoregressiveEBM
¶
Bases: Module
Overview
Autoregressive energy based model.
Interface:
__init__, forward
__init__(obs_shape, action_shape, hidden_size=512, hidden_layer_num=4)
¶
Overview
Initialize the AutoregressiveEBM.
Arguments:
- obs_shape (:obj:int): Observation shape.
- action_shape (:obj:int): Action shape.
- hidden_size (:obj:int): Hidden size.
- hidden_layer_num (:obj:int): Number of hidden layers.
forward(obs, action)
¶
Overview
Forward computation graph of AutoregressiveEBM.
Arguments:
- obs (:obj:torch.Tensor): Observation of shape (B, N, O).
- action (:obj:torch.Tensor): Action of shape (B, N, A).
Returns:
- pred (:obj:torch.Tensor): Energy of shape (B, N, A).
Examples:
>>> obs = torch.randn(2, 3, 4)
>>> action = torch.randn(2, 3, 5)
>>> arebm = AutoregressiveEBM(4, 5)
>>> pred = arebm(obs, action)
HAVAC
¶
Bases: Module
Overview
The HAVAC model of each agent for HAPPO.
Interfaces:
__init__, forward
__init__(agent_obs_shape, global_obs_shape, action_shape, agent_num, use_lstm=False, lstm_type='gru', encoder_hidden_size_list=[128, 128, 64], actor_head_hidden_size=64, actor_head_layer_num=2, critic_head_hidden_size=64, critic_head_layer_num=1, action_space='discrete', activation=nn.ReLU(), norm_type=None, sigma_type='independent', bound_type=None, res_link=False)
¶
Overview
Init the VAC Model for HAPPO according to arguments.
Arguments:
- agent_obs_shape (:obj:Union[int, SequenceType]): Observation's space for single agent.
- global_obs_shape (:obj:Union[int, SequenceType]): Observation's space for global agent
- action_shape (:obj:Union[int, SequenceType]): Action's space.
- agent_num (:obj:int): Number of agents.
- lstm_type (:obj:str): use lstm or gru, default to gru
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor-nn's Head.
- actor_head_layer_num (:obj:int):
The num of layers used in the network to compute Q value output for actor's nn.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic-nn's Head.
- critic_head_layer_num (:obj:int):
The num of layers used in the network to compute Q value output for critic's nn.
- activation (:obj:Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn,
if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details- res_link (:obj:bool`): use the residual link or not, default to False
IModelWrapper
¶
Bases: ABC
Overview
The basic interface class of model wrappers. Model wrapper is a wrapper class of torch.nn.Module model, which is used to add some extra operations for the wrapped model, such as hidden state maintain for RNN-base model, argmax action selection for discrete action space, etc.
Interfaces:
__init__, __getattr__, info, reset, forward.
__init__(model)
¶
Overview
Initialize model and other necessary member variabls in the model wrapper.
__getattr__(key)
¶
Overview
Get original attrbutes of torch.nn.Module model, such as variables and methods defined in model.
Arguments:
- key (:obj:str): The string key to query.
Returns:
- ret (:obj:Any): The queried attribute.
info(attr_name)
¶
Overview
Get some string information of the indicated attr_name, which is used for debug wrappers.
This method will recursively search for the indicated attr_name.
Arguments:
- attr_name (:obj:str): The string key to query information.
Returns:
- info_string (:obj:str): The information string of the indicated attr_name.
reset(data_id=None, **kwargs)
¶
Overview
Basic interface, reset some stateful varaibles in the model wrapper, such as hidden state of RNN.
Here we do nothing and just implement this interface method.
Other derived model wrappers can override this method to add some extra operations.
Arguments:
- data_id (:obj:List[int]): The data id list to reset. If None, reset all data. In practice, model wrappers often needs to maintain some stateful variables for each data trajectory, so we leave this data_id argument to reset the stateful variables of the indicated data.
forward(*args, **kwargs)
¶
Overview
Basic interface, call the wrapped model's forward method. Other derived model wrappers can override this method to add some extra operations.
independent_normal_dist(logits)
¶
Overview
Convert different types logit to independent normal distribution.
Arguments:
- logits (:obj:Union[List, Dict]): The logits to be converted.
Returns:
- dist (:obj:torch.distributions.Distribution): The converted normal distribution.
Examples:
>>> logits = [torch.randn(4, 5), torch.ones(4, 5)]
>>> dist = independent_normal_dist(logits)
>>> assert isinstance(dist, torch.distributions.Independent)
>>> assert isinstance(dist.base_dist, torch.distributions.Normal)
>>> assert dist.base_dist.loc.shape == torch.Size([4, 5])
>>> assert dist.base_dist.scale.shape == torch.Size([4, 5])
Raises:
- TypeError: If the type of logits is not list or dict.
create_model(cfg)
¶
Overview
Create a neural network model according to the given EasyDict-type cfg.
Arguments:
- cfg: (:obj:EasyDict): User's model config. The key import_name is used to import modules, and they key type is used to indicate the model.
Returns:
- (:obj:torch.nn.Module): The created neural network model.
Examples:
>>> cfg = EasyDict({
>>> 'import_names': ['ding.model.template.q_learning'],
>>> 'type': 'dqn',
>>> 'obs_shape': 4,
>>> 'action_shape': 2,
>>> })
>>> model = create_model(cfg)
.. tip::
This method will not modify the cfg , it will deepcopy the cfg and then modify it.
model_wrap(model, wrapper_name=None, **kwargs)
¶
Overview
Wrap the model with the specified wrapper and return the wrappered model.
Arguments:
- model (:obj:Any): The model to be wrapped.
- wrapper_name (:obj:str): The name of the wrapper to be used.
.. note:: The arguments of the wrapper should be passed in as kwargs.
register_wrapper(name, wrapper_type)
¶
Overview
Register new wrapper to wrapper_name_map. When user implements a new wrapper, they must call this function to complete the registration. Then the wrapper can be called by model_wrap.
Arguments:
- name (:obj:str): The name of the new wrapper to be registered.
- wrapper_type (:obj:type): The wrapper class needs to be added in wrapper_name_map. This argument should be the subclass of IModelWrapper.
Full Source Code
../ding/model/__init__.py