ding.model.template.maqac¶
ding.model.template.maqac
¶
DiscreteMAQAC
¶
Bases: Module
Overview
The neural network and computation graph of algorithms related to discrete action Multi-Agent Q-value Actor-CritiC (MAQAC) model. The model is composed of actor and critic, where actor is a MLP network and critic is a MLP network. The actor network is used to predict the action probability distribution, and the critic network is used to predict the Q value of the state-action pair.
Interfaces:
__init__, forward, compute_actor, compute_critic
__init__(agent_obs_shape, global_obs_shape, action_shape, twin_critic=False, actor_head_hidden_size=64, actor_head_layer_num=1, critic_head_hidden_size=64, critic_head_layer_num=1, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the DiscreteMAQAC Model according to arguments.
Arguments:
- agent_obs_shape (:obj:Union[int, SequenceType]): Agent's observation's space.
- global_obs_shape (:obj:Union[int, SequenceType]): Global observation's space.
- obs_shape (:obj:Union[int, SequenceType]): Observation's space.
- action_shape (:obj:Union[int, SequenceType]): Action's space.
- twin_critic (:obj:bool): Whether include twin critic.
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor-nn's Head.
- actor_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for actor's nn.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic-nn's Head.
- critic_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for critic's nn.
- activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]): The type of normalization to use, see ding.torch_utils.fc_block for more details.
forward(inputs, mode)
¶
Overview
Use observation tensor to predict output, with compute_actor or compute_critic mode.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- obs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- agent_state (:obj:torch.Tensor): The agent's observation tensor data, with shape :math:(B, A, N0), where B is batch size and A is agent num. N0 corresponds to agent_obs_shape.
- global_state (:obj:torch.Tensor): The global observation tensor data, with shape :math:(B, A, N1), where B is batch size and A is agent num. N1 corresponds to global_obs_shape.
- action_mask (:obj:torch.Tensor): The action mask tensor data, with shape :math:(B, A, N2), where B is batch size and A is agent num. N2 corresponds to action_shape.
- mode (:obj:`str`): The forward mode, all the modes are defined in the beginning of this class.
Returns:
- output (:obj:Dict[str, torch.Tensor]): The output dict of DiscreteMAQAC forward computation graph, whose key-values vary in different forward modes.
Examples:
>>> B = 32
>>> agent_obs_shape = 216
>>> global_obs_shape = 264
>>> agent_num = 8
>>> action_shape = 14
>>> data = {
>>> 'obs': {
>>> 'agent_state': torch.randn(B, agent_num, agent_obs_shape),
>>> 'global_state': torch.randn(B, agent_num, global_obs_shape),
>>> 'action_mask': torch.randint(0, 2, size=(B, agent_num, action_shape))
>>> }
>>> }
>>> model = DiscreteMAQAC(agent_obs_shape, global_obs_shape, action_shape, twin_critic=True)
>>> logit = model(data, mode='compute_actor')['logit']
>>> value = model(data, mode='compute_critic')['q_value']
compute_actor(inputs)
¶
Overview
Use observation tensor to predict action logits.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- obs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- agent_state (:obj:torch.Tensor): The agent's observation tensor data, with shape :math:(B, A, N0), where B is batch size and A is agent num. N0 corresponds to agent_obs_shape.
- global_state (:obj:torch.Tensor): The global observation tensor data, with shape :math:(B, A, N1), where B is batch size and A is agent num. N1 corresponds to global_obs_shape.
- action_mask (:obj:torch.Tensor): The action mask tensor data, with shape :math:(B, A, N2), where B is batch size and A is agent num. N2 corresponds to action_shape.
Returns:
- output (:obj:Dict[str, torch.Tensor]): The output dict of DiscreteMAQAC forward computation graph, whose key-values vary in different forward modes.
- logit (:obj:torch.Tensor): Action's output logit (real value range), whose shape is :math:(B, A, N2), where N2 corresponds to action_shape.
- action_mask (:obj:torch.Tensor): Action mask tensor with same size as action_shape.
Examples:
>>> B = 32
>>> agent_obs_shape = 216
>>> global_obs_shape = 264
>>> agent_num = 8
>>> action_shape = 14
>>> data = {
>>> 'obs': {
>>> 'agent_state': torch.randn(B, agent_num, agent_obs_shape),
>>> 'global_state': torch.randn(B, agent_num, global_obs_shape),
>>> 'action_mask': torch.randint(0, 2, size=(B, agent_num, action_shape))
>>> }
>>> }
>>> model = DiscreteMAQAC(agent_obs_shape, global_obs_shape, action_shape, twin_critic=True)
>>> logit = model.compute_actor(data)['logit']
compute_critic(inputs)
¶
Overview
use observation tensor to predict Q value.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- obs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- agent_state (:obj:torch.Tensor): The agent's observation tensor data, with shape :math:(B, A, N0), where B is batch size and A is agent num. N0 corresponds to agent_obs_shape.
- global_state (:obj:torch.Tensor): The global observation tensor data, with shape :math:(B, A, N1), where B is batch size and A is agent num. N1 corresponds to global_obs_shape.
- action_mask (:obj:torch.Tensor): The action mask tensor data, with shape :math:(B, A, N2), where B is batch size and A is agent num. N2 corresponds to action_shape.
Returns:
- output (:obj:Dict[str, torch.Tensor]): The output dict of DiscreteMAQAC forward computation graph, whose key-values vary in different values of twin_critic.
- q_value (:obj:list): If twin_critic=True, q_value should be 2 elements, each is the shape of :math:(B, A, N2), where B is batch size and A is agent num. N2 corresponds to action_shape. Otherwise, q_value should be torch.Tensor.
Examples:
>>> B = 32
>>> agent_obs_shape = 216
>>> global_obs_shape = 264
>>> agent_num = 8
>>> action_shape = 14
>>> data = {
>>> 'obs': {
>>> 'agent_state': torch.randn(B, agent_num, agent_obs_shape),
>>> 'global_state': torch.randn(B, agent_num, global_obs_shape),
>>> 'action_mask': torch.randint(0, 2, size=(B, agent_num, action_shape))
>>> }
>>> }
>>> model = DiscreteMAQAC(agent_obs_shape, global_obs_shape, action_shape, twin_critic=True)
>>> value = model.compute_critic(data)['q_value']
ContinuousMAQAC
¶
Bases: Module
Overview
The neural network and computation graph of algorithms related to continuous action Multi-Agent Q-value Actor-CritiC (MAQAC) model. The model is composed of actor and critic, where actor is a MLP network and critic is a MLP network. The actor network is used to predict the action probability distribution, and the critic network is used to predict the Q value of the state-action pair.
Interfaces:
__init__, forward, compute_actor, compute_critic
__init__(agent_obs_shape, global_obs_shape, action_shape, action_space, twin_critic=False, actor_head_hidden_size=64, actor_head_layer_num=1, critic_head_hidden_size=64, critic_head_layer_num=1, activation=nn.ReLU(), norm_type=None)
¶
Overview
Initialize the QAC Model according to arguments.
Arguments:
- obs_shape (:obj:Union[int, SequenceType]): Observation's space.
- action_shape (:obj:Union[int, SequenceType, EasyDict]): Action's space, such as 4, (3, )
- action_space (:obj:str): Whether choose regression or reparameterization.
- twin_critic (:obj:bool): Whether include twin critic.
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor-nn's Head.
- actor_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for actor's nn.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic-nn's Head.
- critic_head_layer_num (:obj:int): The num of layers used in the network to compute Q value output for critic's nn.
- activation (:obj:Optional[nn.Module]): The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]): The type of normalization to use, see ding.torch_utils.fc_block for more details.
forward(inputs, mode)
¶
Overview
Use observation and action tensor to predict output in compute_actor or compute_critic mode.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- obs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- agent_state (:obj:torch.Tensor): The agent's observation tensor data, with shape :math:(B, A, N0), where B is batch size and A is agent num. N0 corresponds to agent_obs_shape.
- global_state (:obj:torch.Tensor): The global observation tensor data, with shape :math:(B, A, N1), where B is batch size and A is agent num. N1 corresponds to global_obs_shape.
- action_mask (:obj:torch.Tensor): The action mask tensor data, with shape :math:(B, A, N2), where B is batch size and A is agent num. N2 corresponds to action_shape.
- ``action`` (:obj:`torch.Tensor`): The action tensor data, with shape :math:`(B, A, N3)`, where B is batch size and A is agent num. N3 corresponds to ``action_shape``.
- mode (:obj:`str`): Name of the forward mode.
Returns:
- outputs (:obj:Dict): Outputs of network forward, whose key-values will be different for different mode, twin_critic, action_space.
Examples:
>>> B = 32
>>> agent_obs_shape = 216
>>> global_obs_shape = 264
>>> agent_num = 8
>>> action_shape = 14
>>> act_space = 'reparameterization' # regression
>>> data = {
>>> 'obs': {
>>> 'agent_state': torch.randn(B, agent_num, agent_obs_shape),
>>> 'global_state': torch.randn(B, agent_num, global_obs_shape),
>>> 'action_mask': torch.randint(0, 2, size=(B, agent_num, action_shape))
>>> },
>>> 'action': torch.randn(B, agent_num, squeeze(action_shape))
>>> }
>>> model = ContinuousMAQAC(agent_obs_shape, global_obs_shape, action_shape, act_space, twin_critic=False)
>>> if action_space == 'regression':
>>> action = model(data['obs'], mode='compute_actor')['action']
>>> elif action_space == 'reparameterization':
>>> (mu, sigma) = model(data['obs'], mode='compute_actor')['logit']
>>> value = model(data, mode='compute_critic')['q_value']
compute_actor(inputs)
¶
Overview
Use observation tensor to predict action logits.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- agent_state (:obj:torch.Tensor): The agent's observation tensor data, with shape :math:(B, A, N0), where B is batch size and A is agent num. N0 corresponds to agent_obs_shape.
Returns:
| Type | Description |
|---|---|
Dict
|
|
ReturnKeys (action_space == 'regression'):
- action (:obj:torch.Tensor): Action tensor with same size as action_shape.
ReturnKeys (action_space == 'reparameterization'):
- logit (:obj:list): 2 elements, each is the shape of :math:(B, A, N3), where B is batch size and A is agent num. N3 corresponds to action_shape.
Examples:
>>> B = 32
>>> agent_obs_shape = 216
>>> global_obs_shape = 264
>>> agent_num = 8
>>> action_shape = 14
>>> act_space = 'reparameterization' # 'regression'
>>> data = {
>>> 'agent_state': torch.randn(B, agent_num, agent_obs_shape),
>>> }
>>> model = ContinuousMAQAC(agent_obs_shape, global_obs_shape, action_shape, act_space, twin_critic=False)
>>> if action_space == 'regression':
>>> action = model.compute_actor(data)['action']
>>> elif action_space == 'reparameterization':
>>> (mu, sigma) = model.compute_actor(data)['logit']
compute_critic(inputs)
¶
Overview
Use observation tensor and action tensor to predict Q value.
Arguments:
- inputs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- obs (:obj:Dict[str, torch.Tensor]): The input dict tensor data, has keys:
- agent_state (:obj:torch.Tensor): The agent's observation tensor data, with shape :math:(B, A, N0), where B is batch size and A is agent num. N0 corresponds to agent_obs_shape.
- global_state (:obj:torch.Tensor): The global observation tensor data, with shape :math:(B, A, N1), where B is batch size and A is agent num. N1 corresponds to global_obs_shape.
- action_mask (:obj:torch.Tensor): The action mask tensor data, with shape :math:(B, A, N2), where B is batch size and A is agent num. N2 corresponds to action_shape.
- ``action`` (:obj:`torch.Tensor`): The action tensor data, with shape :math:`(B, A, N3)`, where B is batch size and A is agent num. N3 corresponds to ``action_shape``.
Returns:
| Type | Description |
|---|---|
Dict
|
|
ReturnKeys (twin_critic=True):
- q_value (:obj:list): 2 elements, each is the shape of :math:(B, A), where B is batch size and A is agent num.
ReturnKeys (twin_critic=False):
- q_value (:obj:torch.Tensor): :math:(B, A), where B is batch size and A is agent num.
Examples:
>>> B = 32
>>> agent_obs_shape = 216
>>> global_obs_shape = 264
>>> agent_num = 8
>>> action_shape = 14
>>> act_space = 'reparameterization' # 'regression'
>>> data = {
>>> 'obs': {
>>> 'agent_state': torch.randn(B, agent_num, agent_obs_shape),
>>> 'global_state': torch.randn(B, agent_num, global_obs_shape),
>>> 'action_mask': torch.randint(0, 2, size=(B, agent_num, action_shape))
>>> },
>>> 'action': torch.randn(B, agent_num, squeeze(action_shape))
>>> }
>>> model = ContinuousMAQAC(agent_obs_shape, global_obs_shape, action_shape, act_space, twin_critic=False)
>>> value = model.compute_critic(data)['q_value']
Full Source Code
../ding/model/template/maqac.py