ding.model.template.bcq¶
ding.model.template.bcq
¶
BCQ
¶
Bases: Module
Overview
Model of BCQ (Batch-Constrained deep Q-learning). Off-Policy Deep Reinforcement Learning without Exploration. https://arxiv.org/abs/1812.02900
Interface:
forward, compute_actor, compute_critic, compute_vae, compute_eval
Property:
mode
__init__(obs_shape, action_shape, actor_head_hidden_size=[400, 300], critic_head_hidden_size=[400, 300], activation=nn.ReLU(), vae_hidden_dims=[750, 750], phi=0.05)
¶
Overview
Initialize neural network, i.e. agent Q network and actor.
Arguments:
- obs_shape (:obj:int): the dimension of observation state
- action_shape (:obj:int): the dimension of action shape
- actor_hidden_size (:obj:list): the list of hidden size of actor
- critic_hidden_size (:obj:'list'): the list of hidden size of critic
- activation (:obj:nn.Module): Activation function in network, defaults to nn.ReLU().
- vae_hidden_dims (:obj:list): the list of hidden size of vae
forward(inputs, mode)
¶
Overview
The unique execution (forward) method of BCQ method, and one can indicate different modes to implement different computation graph, including compute_actor and compute_critic in BCQ.
Mode compute_actor:
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- output (:obj:Dict): Output dict data, including action tensor.
Mode compute_critic:
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- output (:obj:Dict): Output dict data, including q_value tensor.
Mode compute_vae:
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- outputs (:obj:Dict): Dict containing keywords recons_action (:obj:torch.Tensor), prediction_residual (:obj:torch.Tensor), input (:obj:torch.Tensor), mu (:obj:torch.Tensor), log_var (:obj:torch.Tensor) and z (:obj:torch.Tensor).
Mode compute_eval:
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- output (:obj:Dict): Output dict data, including action tensor.
Examples:
>>> inputs = {'obs': torch.randn(4, 32), 'action': torch.randn(4, 6)}
>>> model = BCQ(32, 6)
>>> outputs = model(inputs, mode='compute_actor')
>>> outputs = model(inputs, mode='compute_critic')
>>> outputs = model(inputs, mode='compute_vae')
>>> outputs = model(inputs, mode='compute_eval')
.. note::
For specific examples, one can refer to API doc of compute_actor and compute_critic respectively.
compute_critic(inputs)
¶
Overview
Use critic network to compute q value.
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- outputs (:obj:Dict): Dict containing keywords q_value (:obj:torch.Tensor).
Shapes:
- inputs (:obj:Dict): :math:(B, N, D), where B is batch size, N is sample number, D is input dimension.
- outputs (:obj:Dict): :math:(B, N).
Examples:
>>> inputs = {'obs': torch.randn(4, 32), 'action': torch.randn(4, 6)}
>>> model = BCQ(32, 6)
>>> outputs = model.compute_critic(inputs)
compute_actor(inputs)
¶
Overview
Use actor network to compute action.
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- outputs (:obj:Dict): Dict containing keywords action (:obj:torch.Tensor).
Shapes:
- inputs (:obj:Dict): :math:(B, N, D), where B is batch size, N is sample number, D is input dimension.
- outputs (:obj:Dict): :math:(B, N).
Examples:
>>> inputs = {'obs': torch.randn(4, 32), 'action': torch.randn(4, 6)}
>>> model = BCQ(32, 6)
>>> outputs = model.compute_actor(inputs)
compute_vae(inputs)
¶
Overview
Use vae network to compute action.
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- outputs (:obj:Dict): Dict containing keywords recons_action (:obj:torch.Tensor), prediction_residual (:obj:torch.Tensor), input (:obj:torch.Tensor), mu (:obj:torch.Tensor), log_var (:obj:torch.Tensor) and z (:obj:torch.Tensor).
Shapes:
- inputs (:obj:Dict): :math:(B, N, D), where B is batch size, N is sample number, D is input dimension.
- outputs (:obj:Dict): :math:(B, N).
Examples:
>>> inputs = {'obs': torch.randn(4, 32), 'action': torch.randn(4, 6)}
>>> model = BCQ(32, 6)
>>> outputs = model.compute_vae(inputs)
compute_eval(inputs)
¶
Overview
Use actor network to compute action.
Arguments:
- inputs (:obj:Dict): Input dict data, including obs and action tensor.
Returns:
- outputs (:obj:Dict): Dict containing keywords action (:obj:torch.Tensor).
Shapes:
- inputs (:obj:Dict): :math:(B, N, D), where B is batch size, N is sample number, D is input dimension.
- outputs (:obj:Dict): :math:(B, N).
Examples:
>>> inputs = {'obs': torch.randn(4, 32), 'action': torch.randn(4, 6)}
>>> model = BCQ(32, 6)
>>> outputs = model.compute_eval(inputs)
Full Source Code
../ding/model/template/bcq.py