ding.model.template.havac¶
ding.model.template.havac
¶
RNNLayer
¶
Bases: Module
forward(x, prev_state, inference=False)
¶
Forward pass of the RNN layer. If inference is True, sequence length of input is set to 1. If res_link is True, a residual link is added to the output.
HAVAC
¶
Bases: Module
Overview
The HAVAC model of each agent for HAPPO.
Interfaces:
__init__, forward
__init__(agent_obs_shape, global_obs_shape, action_shape, agent_num, use_lstm=False, lstm_type='gru', encoder_hidden_size_list=[128, 128, 64], actor_head_hidden_size=64, actor_head_layer_num=2, critic_head_hidden_size=64, critic_head_layer_num=1, action_space='discrete', activation=nn.ReLU(), norm_type=None, sigma_type='independent', bound_type=None, res_link=False)
¶
Overview
Init the VAC Model for HAPPO according to arguments.
Arguments:
- agent_obs_shape (:obj:Union[int, SequenceType]): Observation's space for single agent.
- global_obs_shape (:obj:Union[int, SequenceType]): Observation's space for global agent
- action_shape (:obj:Union[int, SequenceType]): Action's space.
- agent_num (:obj:int): Number of agents.
- lstm_type (:obj:str): use lstm or gru, default to gru
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor-nn's Head.
- actor_head_layer_num (:obj:int):
The num of layers used in the network to compute Q value output for actor's nn.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic-nn's Head.
- critic_head_layer_num (:obj:int):
The num of layers used in the network to compute Q value output for critic's nn.
- activation (:obj:Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn,
if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details- res_link (:obj:bool`): use the residual link or not, default to False
HAVACAgent
¶
Bases: Module
Overview
The HAVAC model of each agent for HAPPO.
Interfaces:
__init__, forward, compute_actor, compute_critic, compute_actor_critic
__init__(agent_obs_shape, global_obs_shape, action_shape, use_lstm=False, lstm_type='gru', encoder_hidden_size_list=[128, 128, 64], actor_head_hidden_size=64, actor_head_layer_num=2, critic_head_hidden_size=64, critic_head_layer_num=1, action_space='discrete', activation=nn.ReLU(), norm_type=None, sigma_type='happo', bound_type=None, res_link=False)
¶
Overview
Init the VAC Model for HAPPO according to arguments.
Arguments:
- agent_obs_shape (:obj:Union[int, SequenceType]): Observation's space for single agent.
- global_obs_shape (:obj:Union[int, SequenceType]): Observation's space for global agent
- action_shape (:obj:Union[int, SequenceType]): Action's space.
- lstm_type (:obj:str): use lstm or gru, default to gru
- encoder_hidden_size_list (:obj:SequenceType): Collection of hidden_size to pass to Encoder
- actor_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to actor-nn's Head.
- actor_head_layer_num (:obj:int):
The num of layers used in the network to compute Q value output for actor's nn.
- critic_head_hidden_size (:obj:Optional[int]): The hidden_size to pass to critic-nn's Head.
- critic_head_layer_num (:obj:int):
The num of layers used in the network to compute Q value output for critic's nn.
- activation (:obj:Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn,
if None then default set to nn.ReLU()
- norm_type (:obj:Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details- res_link (:obj:bool`): use the residual link or not, default to False
forward(inputs, mode)
¶
Overview
Use encoded embedding tensor to predict output. Parameter updates with VAC's MLPs forward setup.
Arguments:
Forward with 'compute_actor' or 'compute_critic':
- inputs (:obj:torch.Tensor):
The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size).
Whether actor_head_hidden_size or critic_head_hidden_size depend on mode.
Returns:
- outputs (:obj:Dict):
Run with encoder and head.
Forward with ``'compute_actor'``, Necessary Keys:
- logit (:obj:`torch.Tensor`): Logit encoding tensor, with same size as input ``x``.
Forward with ``'compute_critic'``, Necessary Keys:
- value (:obj:`torch.Tensor`): Q value tensor with same size as batch size.
Shapes:
- inputs (:obj:torch.Tensor): :math:(B, N), where B is batch size and N corresponding hidden_size
- logit (:obj:torch.FloatTensor): :math:(B, N), where B is batch size and N is action_shape
- value (:obj:torch.FloatTensor): :math:(B, ), where B is batch size.
Actor Examples
model = VAC(64,128) inputs = torch.randn(4, 64) actor_outputs = model(inputs,'compute_actor') assert actor_outputs['logit'].shape == torch.Size([4, 128])
Critic Examples
model = VAC(64,64) inputs = torch.randn(4, 64) critic_outputs = model(inputs,'compute_critic') critic_outputs['value'] tensor([0.0252, 0.0235, 0.0201, 0.0072], grad_fn=
)
Actor-Critic Examples
model = VAC(64,64) inputs = torch.randn(4, 64) outputs = model(inputs,'compute_actor_critic') outputs['value'] tensor([0.0252, 0.0235, 0.0201, 0.0072], grad_fn=
) assert outputs['logit'].shape == torch.Size([4, 64])
compute_actor(inputs, inference=False)
¶
Overview
Execute parameter updates with 'compute_actor' mode
Use encoded embedding tensor to predict output.
Arguments:
- inputs (:obj:torch.Tensor):
input data dict with keys ['obs'(with keys ['agent_state', 'global_state', 'action_mask']),
'actor_prev_state']
Returns:
- outputs (:obj:Dict):
Run with encoder RNN(optional) and head.
ReturnsKeys
- logit (:obj:
torch.Tensor): Logit encoding tensor. - actor_next_state:
- hidden_state
Shapes:
- logit (:obj:torch.FloatTensor): :math:(B, N), where B is batch size and N is action_shape
- actor_next_state: (B,)
- hidden_state:
Examples:
>>> model = HAVAC(
agent_obs_shape=obs_dim,
global_obs_shape=global_obs_dim,
action_shape=action_dim,
use_lstm = True,
)
>>> inputs = {
'obs': {
'agent_state': torch.randn(T, bs, obs_dim),
'global_state': torch.randn(T, bs, global_obs_dim),
'action_mask': torch.randint(0, 2, size=(T, bs, action_dim))
},
'actor_prev_state': [None for _ in range(bs)],
}
>>> actor_outputs = model(inputs,'compute_actor')
>>> assert actor_outputs['logit'].shape == (T, bs, action_dim)
compute_critic(inputs, inference=False)
¶
Overview
Execute parameter updates with 'compute_critic' mode
Use encoded embedding tensor to predict output.
Arguments:
- inputs (:obj:Dict):
input data dict with keys ['obs'(with keys ['agent_state', 'global_state', 'action_mask']),
'critic_prev_state'(when you are using rnn)]
Returns:
- outputs (:obj:Dict):
Run with encoder [rnn] and head.
Necessary Keys:
- value (:obj:`torch.Tensor`): Q value tensor with same size as batch size.
- logits
Shapes:
- value (:obj:torch.FloatTensor): :math:(B, ), where B is batch size.
- logits
Examples:
>>> model = HAVAC(
agent_obs_shape=obs_dim,
global_obs_shape=global_obs_dim,
action_shape=action_dim,
use_lstm = True,
)
>>> inputs = {
'obs': {
'agent_state': torch.randn(T, bs, obs_dim),
'global_state': torch.randn(T, bs, global_obs_dim),
'action_mask': torch.randint(0, 2, size=(T, bs, action_dim))
},
'critic_prev_state': [None for _ in range(bs)],
}
>>> critic_outputs = model(inputs,'compute_critic')
>>> assert critic_outputs['value'].shape == (T, bs))
compute_actor_critic(inputs, inference=False)
¶
Overview
Execute parameter updates with 'compute_actor_critic' mode
Use encoded embedding tensor to predict output.
Arguments: - inputs (:dict): input data dict with keys ['obs'(with keys ['agent_state', 'global_state', 'action_mask']), 'actor_prev_state', 'critic_prev_state'(when you are using rnn)]
Returns:
| Type | Description |
|---|---|
Dict
|
|
ReturnsKeys
- logit (:obj:
torch.Tensor): Logit encoding tensor, with same size as inputx. - value (:obj:
torch.Tensor): Q value tensor with same size as batch size.
Shapes:
- logit (:obj:torch.FloatTensor): :math:(B, N), where B is batch size and N is action_shape
- value (:obj:torch.FloatTensor): :math:(B, ), where B is batch size.
Examples:
>>> model = VAC(64,64)
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs,'compute_actor_critic')
>>> outputs['value']
tensor([0.0252, 0.0235, 0.0201, 0.0072], grad_fn=<SqueezeBackward1>)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
.. note::
compute_actor_critic interface aims to save computation when shares encoder.
Returning the combination dictionry.
Full Source Code
../ding/model/template/havac.py