ding.model.template.q_learning¶
ding.model.template.q_learning
¶
DQN
¶
Bases: Module
Overview
The neural nework structure and computation graph of Deep Q Network (DQN) algorithm, which is the most classic value-based RL algorithm for discrete action. The DQN is composed of two parts: encoder and head. The encoder is used to extract the feature from various observation, and the head is used to compute the Q value of each action dimension.
Interfaces:
__init__, forward.
.. note::
Current DQN supports two types of encoder: FCEncoder and ConvEncoder, two types of head: DiscreteHead and DuelingHead. You can customize your own encoder or head by inheriting this class.
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], dueling=True, head_hidden_size=None, head_layer_num=1, activation=nn.ReLU(), norm_type=None, dropout=None, init_bias=None, noise=False)
¶
Overview
initialize the DQN (encoder + head) Model according to corresponding input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].
- action_shape (:obj:Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3].
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size.
- dueling (:obj:Optional[bool]): Whether choose DuelingHead or DiscreteHead (default).
- head_hidden_size (:obj:Optional[int]): The hidden_size of head network, defaults to None, then it will be set to the last element of encoder_hidden_size_list.
- head_layer_num (:obj:int): The number of layers used in the head network to compute Q value output.
- activation (:obj:Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details. you can choose one of ['BN', 'IN', 'SyncBN', 'LN']
- dropout (:obj:Optional[float]): The dropout rate of the dropout layer. if None then default disable dropout layer.
- init_bias (:obj:Optional[float]): The initial value of the last layer bias in the head network. - noise (:obj:bool): Whether to use NoiseLinearLayer as layer_fn to boost exploration in Q networks' MLP. Default to False.
forward(x)
¶
Overview
DQN forward computation graph, input observation tensor to predict q_value.
Arguments:
- x (:obj:torch.Tensor): The input observation tensor data.
Returns:
- outputs (:obj:Dict): The output of DQN's forward, including q_value.
ReturnsKeys:
- logit (:obj:torch.Tensor): Discrete Q-value output of each possible action dimension.
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is obs_shape
- logit (:obj:torch.Tensor): :math:(B, M), where B is batch size and M is action_shape
Examples:
>>> model = DQN(32, 6) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 32)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 6])
.. note::
For consistency and compatibility, we name all the outputs of the network which are related to action selections as logit.
BDQ
¶
Bases: Module
__init__(obs_shape, num_branches=0, action_bins_per_branch=2, layer_num=3, a_layer_num=None, v_layer_num=None, encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, norm_type=None, activation=nn.ReLU())
¶
Overview
Init the BDQ (encoder + head) Model according to input arguments. referenced paper Action Branching Architectures for Deep Reinforcement Learning https://arxiv.org/pdf/1711.08946
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].
- num_branches (:obj:int): The number of branches, which is equivalent to the action dimension, such as 6 in mujoco's halfcheetah environment.
- action_bins_per_branch (:obj:int): The number of actions in each dimension.
- layer_num (:obj:int): The number of layers used in the network to compute Advantage and Value output.
- a_layer_num (:obj:int): The number of layers used in the network to compute Advantage output.
- v_layer_num (:obj:int): The number of layers used in the network to compute Value output.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size.
- head_hidden_size (:obj:Optional[int]): The hidden_size of head network.
- norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details.
- activation (:obj:Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU()
forward(x)
¶
Overview
BDQ forward computation graph, input observation tensor to predict q_value.
Arguments:
- x (:obj:torch.Tensor): Observation inputs
Returns:
- outputs (:obj:Dict): BDQ forward outputs, such as q_value.
ReturnsKeys:
- logit (:obj:torch.Tensor): Discrete Q-value output of each action dimension.
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is obs_shape
- logit (:obj:torch.FloatTensor): :math:(B, M), where B is batch size and M is
num_branches * action_bins_per_branch
Examples:
>>> model = BDQ(8, 5, 2) # arguments: 'obs_shape', 'num_branches' and 'action_bins_per_branch'.
>>> inputs = torch.randn(4, 8)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 5, 2])
C51DQN
¶
Bases: Module
Overview
The neural network structure and computation graph of C51DQN, which combines distributional RL and DQN. You can refer to https://arxiv.org/pdf/1707.06887.pdf for more details. The C51DQN is composed of encoder and head. encoder is used to extract the feature of observation, and head is used to compute the distribution of Q-value.
Interfaces:
__init__, forward
.. note::
Current C51DQN supports two types of encoder: FCEncoder and ConvEncoder.
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, head_layer_num=1, activation=nn.ReLU(), norm_type=None, v_min=-10, v_max=10, n_atom=51)
¶
Overview
initialize the C51 Model according to corresponding input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].
- action_shape (:obj:Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3].
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size.
- head_hidden_size (:obj:Optional[int]): The hidden_size of head network, defaults to None, then it will be set to the last element of encoder_hidden_size_list.
- head_layer_num (:obj:int): The number of layers used in the head network to compute Q value output.
- activation (:obj:Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU().
- norm_type (:obj:Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details. you can choose one of ['BN', 'IN', 'SyncBN', 'LN']
- v_min (:obj:Optional[float]): The minimum value of the support of the distribution, which is related to the value (discounted sum of reward) scale of the specific environment. Defaults to -10.
- v_max (:obj:Optional[float]): The maximum value of the support of the distribution, which is related to the value (discounted sum of reward) scale of the specific environment. Defaults to 10.
- n_atom (:obj:Optional[int]): The number of atoms in the prediction distribution, 51 is the default value in the paper, you can also try other values such as 301.
forward(x)
¶
Overview
C51DQN forward computation graph, input observation tensor to predict q_value and its distribution.
Arguments:
- x (:obj:torch.Tensor): The input observation tensor data.
Returns:
- outputs (:obj:Dict): The output of DQN's forward, including q_value, and distribution.
ReturnsKeys:
- logit (:obj:torch.Tensor): Discrete Q-value output of each possible action dimension.
- distribution (:obj:torch.Tensor): Q-Value discretized distribution, i.e., probability of each uniformly spaced atom Q-value, such as dividing [-10, 10] into 51 uniform spaces.
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is head_hidden_size.
- logit (:obj:torch.Tensor): :math:(B, M), where M is action_shape.
- distribution(:obj:torch.Tensor): :math:(B, M, P), where P is n_atom.
Examples:
>>> model = C51DQN(128, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 128)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> # default head_hidden_size: int = 64,
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom: int = 51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])
.. note::
For consistency and compatibility, we name all the outputs of the network which are related to action selections as logit.
.. note:: For convenience, we recommend that the number of atoms should be odd, so that the middle atom is exactly the value of the Q-value.
QRDQN
¶
Bases: Module
Overview
The neural network structure and computation graph of QRDQN, which combines distributional RL and DQN. You can refer to Distributional Reinforcement Learning with Quantile Regression https://arxiv.org/pdf/1710.10044.pdf for more details.
Interfaces:
__init__, forward
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, head_layer_num=1, num_quantiles=32, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the QRDQN Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's space.
- action_shape (:obj:Union[int, SequenceType]): Action's space.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder
- head_hidden_size (:obj:Optional[int]): The hidden_size to pass to Head.
- head_layer_num (:obj:int): The num of layers used in the network to compute Q value output
- num_quantiles (:obj:int): Number of quantiles in the prediction distribution.
- activation (:obj:Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn,
if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details`
forward(x)
¶
Overview
Use observation tensor to predict QRDQN's output. Parameter updates with QRDQN's MLPs forward setup.
Arguments:
- x (:obj:torch.Tensor):
The encoded embedding tensor with (B, N=hidden_size).
Returns:
- outputs (:obj:Dict):
Run with encoder and head. Return the result prediction dictionary.
ReturnsKeys:
- logit (:obj:torch.Tensor): Logit tensor with same size as input x.
- q (:obj:torch.Tensor): Q valye tensor tensor of size (B, N, num_quantiles)
- tau (:obj:torch.Tensor): tau tensor of size (B, N, 1)
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is head_hidden_size.
- logit (:obj:torch.FloatTensor): :math:(B, M), where M is action_shape.
- tau (:obj:torch.Tensor): :math:(B, M, 1)
Examples:
>>> model = QRDQN(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles : int = 32
>>> assert outputs['q'].shape == torch.Size([4, 64, 32])
>>> assert outputs['tau'].shape == torch.Size([4, 32, 1])
IQN
¶
Bases: Module
Overview
The neural network structure and computation graph of IQN, which combines distributional RL and DQN. You can refer to paper Implicit Quantile Networks for Distributional Reinforcement Learning https://arxiv.org/pdf/1806.06923.pdf for more details.
Interfaces:
__init__, forward
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, head_layer_num=1, num_quantiles=32, quantile_embedding_size=128, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the IQN Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape.
- action_shape (:obj:Union[int, SequenceType]): Action space shape.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder
- head_hidden_size (:obj:Optional[int]): The hidden_size to pass to Head.
- head_layer_num (:obj:int): The num of layers used in the network to compute Q value output
- num_quantiles (:obj:int): Number of quantiles in the prediction distribution.
- activation (:obj:Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn,
if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details.
forward(x)
¶
Overview
Use encoded embedding tensor to predict IQN's output. Parameter updates with IQN's MLPs forward setup.
Arguments:
- x (:obj:torch.Tensor):
The encoded embedding tensor with (B, N=hidden_size).
Returns:
- outputs (:obj:Dict):
Run with encoder and head. Return the result prediction dictionary.
ReturnsKeys:
- logit (:obj:torch.Tensor): Logit tensor with same size as input x.
- q (:obj:torch.Tensor): Q valye tensor tensor of size (num_quantiles, N, B)
- quantiles (:obj:torch.Tensor): quantiles tensor of size (quantile_embedding_size, 1)
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is head_hidden_size.
- logit (:obj:torch.FloatTensor): :math:(B, M), where M is action_shape
- quantiles (:obj:torch.Tensor): :math:(P, 1), where P is quantile_embedding_size.
Examples:
>>> model = IQN(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles: int = 32
>>> assert outputs['q'].shape == torch.Size([32, 4, 64]
>>> # default quantile_embedding_size: int = 128
>>> assert outputs['quantiles'].shape == torch.Size([128, 1])
FQF
¶
Bases: Module
Overview
The neural network structure and computation graph of FQF, which combines distributional RL and DQN. You can refer to paper Fully Parameterized Quantile Function for Distributional Reinforcement Learning https://arxiv.org/pdf/1911.02140.pdf for more details.
Interface:
__init__, forward
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, head_layer_num=1, num_quantiles=32, quantile_embedding_size=128, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the FQF Model according to input arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape.
- action_shape (:obj:Union[int, SequenceType]): Action space shape.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder
- head_hidden_size (:obj:Optional[int]): The hidden_size to pass to Head.
- head_layer_num (:obj:int): The num of layers used in the network to compute Q value output
- num_quantiles (:obj:int): Number of quantiles in the prediction distribution.
- activation (:obj:Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn,
if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details.
forward(x)
¶
Overview
Use encoded embedding tensor to predict FQF's output. Parameter updates with FQF's MLPs forward setup.
Arguments:
- x (:obj:torch.Tensor):
The encoded embedding tensor with (B, N=hidden_size).
Returns:
- outputs (:obj:Dict): Dict containing keywords logit (:obj:torch.Tensor), q (:obj:torch.Tensor), quantiles (:obj:torch.Tensor), quantiles_hats (:obj:torch.Tensor), q_tau_i (:obj:torch.Tensor), entropies (:obj:torch.Tensor).
Shapes:
- x: :math:(B, N), where B is batch size and N is head_hidden_size.
- logit: :math:(B, M), where M is action_shape.
- q: :math:(B, num_quantiles, M).
- quantiles: :math:(B, num_quantiles + 1).
- quantiles_hats: :math:(B, num_quantiles).
- q_tau_i: :math:(B, num_quantiles - 1, M).
- entropies: :math:(B, 1).
Examples:
>>> model = FQF(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles: int = 32
>>> assert outputs['q'].shape == torch.Size([4, 32, 64])
>>> assert outputs['quantiles'].shape == torch.Size([4, 33])
>>> assert outputs['quantiles_hats'].shape == torch.Size([4, 32])
>>> assert outputs['q_tau_i'].shape == torch.Size([4, 31, 64])
>>> assert outputs['quantiles'].shape == torch.Size([4, 1])
RainbowDQN
¶
Bases: Module
Overview
The neural network structure and computation graph of RainbowDQN, which combines distributional RL and DQN. You can refer to paper Rainbow: Combining Improvements in Deep Reinforcement Learning https://arxiv.org/pdf/1710.02298.pdf for more details.
Interfaces:
__init__, forward
.. note:: RainbowDQN contains dueling architecture by default.
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], head_hidden_size=None, head_layer_num=1, activation=nn.ReLU(), norm_type=None, v_min=-10, v_max=10, n_atom=51)
¶
Overview
Init the Rainbow Model according to arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation space shape.
- action_shape (:obj:Union[int, SequenceType]): Action space shape.
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder
- head_hidden_size (:obj:Optional[int]): The hidden_size to pass to Head.
- head_layer_num (:obj:int): The num of layers used in the network to compute Q value output
- activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]): The type of normalization to use, see ding.torch_utils.fc_block for more details- n_atom (:obj:Optional[int]`): Number of atoms in the prediction distribution.
forward(x)
¶
Overview
Use observation tensor to predict Rainbow output. Parameter updates with Rainbow's MLPs forward setup.
Arguments:
- x (:obj:torch.Tensor):
The encoded embedding tensor with (B, N=hidden_size).
Returns:
- outputs (:obj:Dict):
Run MLP with RainbowHead setups and return the result prediction dictionary.
ReturnsKeys:
- logit (:obj:torch.Tensor): Logit tensor with same size as input x.
- distribution (:obj:torch.Tensor): Distribution tensor of size (B, N, n_atom)
Shapes:
- x (:obj:torch.Tensor): :math:(B, N), where B is batch size and N is head_hidden_size.
- logit (:obj:torch.FloatTensor): :math:(B, M), where M is action_shape.
- distribution(:obj:torch.FloatTensor): :math:(B, M, P), where P is n_atom.
Examples:
>>> model = RainbowDQN(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom: int =51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])
DRQN
¶
Bases: Module
Overview
The DRQN (Deep Recurrent Q-Network) is a neural network model combining DQN with RNN to handle sequential
data and partially observable environments. It consists of three main components: encoder, rnn,
and head.
- Encoder: Extracts features from various observation inputs.
- RNN: Processes sequential observations and other data.
- Head: Computes Q-values for each action dimension.
Interfaces
__init__, forward.
.. note::
The current implementation supports:
- Two encoder types: FCEncoder and ConvEncoder.
- Two head types: DiscreteHead and DuelingHead.
- Three RNN types: normal (LSTM with LayerNorm), pytorch (PyTorch's native LSTM), and gru.
You can extend the model by customizing your own encoder, RNN, or head by inheriting this class.
__init__(obs_shape, action_shape, encoder_hidden_size_list=[128, 128, 64], dueling=True, head_hidden_size=None, head_layer_num=1, lstm_type='normal', activation=nn.ReLU(), norm_type=None, res_link=False)
¶
Overview
Initialize the DRQN model with specified parameters.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Shape of the observation space, e.g., 8 or [4, 84, 84].
- action_shape (:obj:Union[int, SequenceType]): Shape of the action space, e.g., 6 or [2, 3, 3].
- encoder_hidden_size_list (:obj:SequenceType): List of hidden sizes for the encoder. The last element must match head_hidden_size.
- dueling (:obj:Optional[bool]): Use DuelingHead if True, otherwise use DiscreteHead.
- head_hidden_size (:obj:Optional[int]): Hidden size for the head network. Defaults to the last element of encoder_hidden_size_list if None.
- head_layer_num (:obj:int): Number of layers in the head network to compute Q-value outputs.
- lstm_type (:obj:Optional[str]): Type of RNN module. Supported types are normal, pytorch, and gru.
- activation (:obj:Optional[nn.Module]): Activation function used in the network. Defaults to nn.ReLU().
- norm_type (:obj:Optional[str]): Normalization type for the networks. Supported types are: ['BN', 'IN', 'SyncBN', 'LN']. See ding.torch_utils.fc_block for more details.
- res_link (:obj:bool): Enables residual connections between single-frame data and sequential data. Defaults to False.
forward(inputs, inference=False, saved_state_timesteps=None)
¶
Overview
Defines the forward pass of the DRQN model. Takes observation and previous RNN states as inputs and predicts Q-values.
Arguments:
- inputs (:obj:Dict): Input data dictionary containing observation and previous RNN state.
- inference (:obj:bool): If True, unrolls one timestep (used during evaluation). If False, unrolls the entire sequence (used during training).
- saved_state_timesteps (:obj:Optional[list]): When inference is False, specifies the timesteps whose hidden states are saved and returned.
ArgumentsKeys:
- obs (:obj:torch.Tensor): Raw observation tensor.
- prev_state (:obj:list): Previous RNN state tensor, structure depends on lstm_type.
Returns:
- outputs (:obj:Dict): The output of DRQN's forward, including logit (q_value) and next state.
ReturnsKeys:
- logit (:obj:torch.Tensor): Discrete Q-value output for each action dimension.
- next_state (:obj:list): Next RNN state tensor.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, N) where B is batch size and N is obs_shape.
- logit (:obj:torch.Tensor): :math:(B, M) where B is batch size and M is action_shape.
Examples:
>>> # Initialize input keys
>>> prev_state = [[torch.randn(1, 1, 64) for __ in range(2)] for _ in range(4)] # B=4
>>> obs = torch.randn(4,64)
>>> model = DRQN(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> outputs = model({'obs': inputs, 'prev_state': prev_state}, inference=True)
>>> # Validate output keys and shapes
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == (4, 64)
>>> assert len(outputs['next_state']) == 4
>>> assert all([len(t) == 2 for t in outputs['next_state']])
>>> assert all([t[0].shape == (1, 1, 64) for t in outputs['next_state']])
GTrXLDQN
¶
Bases: Module
Overview
The neural network structure and computation graph of Gated Transformer-XL DQN algorithm, which is the enhanced version of DRQN, using Transformer-XL to improve long-term sequential modelling ability. The GTrXL-DQN is composed of three parts: encoder, head and core. The encoder is used to extract the feature from various observation, the core is used to process the sequential observation and other data, and the head is used to compute the Q value of each action dimension.
Interfaces:
__init__, forward, reset_memory, get_memory .
__init__(obs_shape, action_shape, head_layer_num=1, att_head_dim=16, hidden_size=16, att_head_num=2, att_mlp_num=2, att_layer_num=3, memory_len=64, activation=nn.ReLU(), head_norm_type=None, dropout=0.0, gru_gating=True, gru_bias=2.0, dueling=True, encoder_hidden_size_list=[128, 128, 256], encoder_norm_type=None)
¶
Overview
Initialize the GTrXLDQN model accoding to corresponding input arguments.
.. tip::
You can refer to GTrXl class in ding.torch_utils.network.gtrxl for more details about the input arguments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
- obs_shape (
|
obj: |
required | |
- action_shape (
|
obj:Union[int, SequenceType]): Used by Head. Action's space. |
required | |
- head_layer_num (
|
obj: |
required | |
- att_head_dim (
|
obj: |
required | |
- hidden_size (
|
obj: |
required | |
- att_head_num (
|
obj: |
required | |
- att_mlp_num (
|
obj: |
required | |
- att_layer_num (
|
obj: |
required | |
- memory_len (
|
obj: |
required | |
- activation (
|
obj: |
required | |
- head_norm_type (
|
obj: |
required | |
- dropout (
|
obj: |
required | |
- gru_gating (
|
obj: |
required | |
- gru_bias (
|
obj: |
required | |
- dueling (
|
obj: |
required | |
- encoder_hidden_size_list(
|
obj: |
required | |
- encoder_norm_type (
|
obj: |
required |
forward(x)
¶
Overview
Let input tensor go through GTrXl and the Head sequentially.
Arguments:
- x (:obj:torch.Tensor): input tensor of shape (seq_len, bs, obs_shape).
Returns:
- out (:obj:Dict): run GTrXL with DiscreteHead setups and return the result prediction dictionary.
ReturnKeys:
- logit (:obj:torch.Tensor): discrete Q-value output of each action dimension, shape is (B, action_space).
- memory (:obj:torch.Tensor): memory tensor of size (bs x layer_num+1 x memory_len x embedding_dim).
- transformer_out (:obj:torch.Tensor): output tensor of transformer with same size as input x.
Examples:
>>> # Init input's Keys:
>>> obs_dim, seq_len, bs, action_dim = 128, 64, 32, 4
>>> obs = torch.rand(seq_len, bs, obs_dim)
>>> model = GTrXLDQN(obs_dim, action_dim)
>>> outputs = model(obs)
>>> assert isinstance(outputs, dict)
reset_memory(batch_size=None, state=None)
¶
Overview
Clear or reset the memory of GTrXL.
Arguments:
- batch_size (:obj:Optional[int]): The number of samples in a training batch.
- state (:obj:Optional[torch.Tensor]): The input memory data, whose shape is (layer_num, memory_len, bs, embedding_dim).
get_memory()
¶
Overview
Return the memory of GTrXL.
Returns:
- memory: (:obj:Optional[torch.Tensor]): output memory or None if memory has not been initialized, whose shape is (layer_num, memory_len, bs, embedding_dim).
parallel_wrapper(forward_fn)
¶
Overview
Process timestep T and batch_size B at the same time, in other words, treat different timestep data as different trajectories in a batch.
Arguments:
- forward_fn (:obj:Callable): Normal nn.Module 's forward function.
Returns:
- wrapper (:obj:Callable): Wrapped function.
Full Source Code
../ding/model/template/q_learning.py