ding.model.template.ebm¶
ding.model.template.ebm
¶
Vanilla DFO and EBM are adapted from https://github.com/kevinzakka/ibc. MCMC is adapted from https://github.com/google-research/ibc.
StochasticOptimizer
¶
Bases: ABC
Overview
Base class for stochastic optimizers.
Interface:
__init__, _sample, _get_best_action_sample, set_action_bounds, sample, infer
set_action_bounds(action_bounds)
¶
Overview
Set action bounds calculated from the dataset statistics.
Arguments:
- action_bounds (:obj:np.ndarray): Array of shape (2, A), where action_bounds[0] is lower bound and action_bounds[1] is upper bound.
Returns:
- action_bounds (:obj:torch.Tensor): Action bounds.
Shapes:
- action_bounds (:obj:np.ndarray): :math:(2, A).
- action_bounds (:obj:torch.Tensor): :math:(2, A).
Examples:
>>> opt = StochasticOptimizer()
>>> opt.set_action_bounds(np.stack([np.zeros(5), np.ones(5)], axis=0))
sample(obs, ebm)
abstractmethod
¶
Overview
Create tiled observations and sample counter-negatives for InfoNCE loss.
Arguments:
- obs (:obj:torch.Tensor): Observations.
- ebm (:obj:torch.nn.Module): Energy based model.
Returns:
- tiled_obs (:obj:torch.Tensor): Tiled observations.
- action (:obj:torch.Tensor): Actions.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, O).
- ebm (:obj:torch.nn.Module): :math:(B, N, O).
- tiled_obs (:obj:torch.Tensor): :math:(B, N, O).
- action (:obj:torch.Tensor): :math:(B, N, A).
.. note:: In the case of derivative-free optimization, this function will simply call _sample.
infer(obs, ebm)
abstractmethod
¶
Overview
Optimize for the best action conditioned on the current observation.
Arguments:
- obs (:obj:torch.Tensor): Observations.
- ebm (:obj:torch.nn.Module): Energy based model.
Returns:
- best_action_samples (:obj:torch.Tensor): Best actions.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, O).
- ebm (:obj:torch.nn.Module): :math:(B, N, O).
- best_action_samples (:obj:torch.Tensor): :math:(B, A).
DFO
¶
Bases: StochasticOptimizer
Overview
Derivative-Free Optimizer in paper Implicit Behavioral Cloning. https://arxiv.org/abs/2109.00137
Interface:
init, sample, infer
__init__(noise_scale=0.33, noise_shrink=0.5, iters=3, train_samples=8, inference_samples=16384, device='cpu')
¶
Overview
Initialize the Derivative-Free Optimizer
Arguments:
- noise_scale (:obj:float): Initial noise scale.
- noise_shrink (:obj:float): Noise scale shrink rate.
- iters (:obj:int): Number of iterations.
- train_samples (:obj:int): Number of samples for training.
- inference_samples (:obj:int): Number of samples for inference.
- device (:obj:str): Device.
sample(obs, ebm)
¶
Overview
Drawing action samples from the uniform random distribution and tiling observations to the same shape as action samples.
Arguments:
- obs (:obj:torch.Tensor): Observations.
- ebm (:obj:torch.nn.Module): Energy based model.
Returns:
- tiled_obs (:obj:torch.Tensor): Tiled observation.
- action_samples (:obj:torch.Tensor): Action samples.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, O).
- ebm (:obj:torch.nn.Module): :math:(B, N, O).
- tiled_obs (:obj:torch.Tensor): :math:(B, N, O).
- action_samples (:obj:torch.Tensor): :math:(B, N, A).
Examples:
>>> obs = torch.randn(2, 4)
>>> ebm = EBM(4, 5)
>>> opt = DFO()
>>> opt.set_action_bounds(np.stack([np.zeros(5), np.ones(5)], axis=0))
>>> tiled_obs, action_samples = opt.sample(obs, ebm)
infer(obs, ebm)
¶
Overview
Optimize for the best action conditioned on the current observation.
Arguments:
- obs (:obj:torch.Tensor): Observations.
- ebm (:obj:torch.nn.Module): Energy based model.
Returns:
- best_action_samples (:obj:torch.Tensor): Actions.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, O).
- ebm (:obj:torch.nn.Module): :math:(B, N, O).
- best_action_samples (:obj:torch.Tensor): :math:(B, A).
Examples:
>>> obs = torch.randn(2, 4)
>>> ebm = EBM(4, 5)
>>> opt = DFO()
>>> opt.set_action_bounds(np.stack([np.zeros(5), np.ones(5)], axis=0))
>>> best_action_samples = opt.infer(obs, ebm)
AutoRegressiveDFO
¶
Bases: DFO
Overview
AutoRegressive Derivative-Free Optimizer in paper Implicit Behavioral Cloning. https://arxiv.org/abs/2109.00137
Interface:
__init__, infer
__init__(noise_scale=0.33, noise_shrink=0.5, iters=3, train_samples=8, inference_samples=4096, device='cpu')
¶
Overview
Initialize the AutoRegressive Derivative-Free Optimizer
Arguments:
- noise_scale (:obj:float): Initial noise scale.
- noise_shrink (:obj:float): Noise scale shrink rate.
- iters (:obj:int): Number of iterations.
- train_samples (:obj:int): Number of samples for training.
- inference_samples (:obj:int): Number of samples for inference.
- device (:obj:str): Device.
infer(obs, ebm)
¶
Overview
Optimize for the best action conditioned on the current observation.
Arguments:
- obs (:obj:torch.Tensor): Observations.
- ebm (:obj:torch.nn.Module): Energy based model.
Returns:
- best_action_samples (:obj:torch.Tensor): Actions.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, O).
- ebm (:obj:torch.nn.Module): :math:(B, N, O).
- best_action_samples (:obj:torch.Tensor): :math:(B, A).
Examples:
>>> obs = torch.randn(2, 4)
>>> ebm = EBM(4, 5)
>>> opt = AutoRegressiveDFO()
>>> opt.set_action_bounds(np.stack([np.zeros(5), np.ones(5)], axis=0))
>>> best_action_samples = opt.infer(obs, ebm)
MCMC
¶
Bases: StochasticOptimizer
Overview
MCMC method as stochastic optimizers in paper Implicit Behavioral Cloning. https://arxiv.org/abs/2109.00137
Interface:
__init__, sample, infer, grad_penalty
BaseScheduler
¶
Bases: ABC
Overview
Base class for learning rate scheduler.
Interface:
get_rate
get_rate(index)
abstractmethod
¶
Overview
Abstract method for getting learning rate.
ExponentialScheduler
¶
Overview
Exponential learning rate schedule for Langevin sampler.
Interface:
__init__, get_rate
PolynomialScheduler
¶
Overview
Polynomial learning rate schedule for Langevin sampler.
Interface:
__init__, get_rate
__init__(init, final, power, num_steps)
¶
Overview
Initialize the PolynomialScheduler.
Arguments:
- init (:obj:float): Initial learning rate.
- final (:obj:float): Final learning rate.
- power (:obj:float): Power of polynomial.
- num_steps (:obj:int): Number of steps.
get_rate(index)
¶
Overview
Get learning rate for index.
Arguments:
- index (:obj:int): Current iteration.
__init__(iters=100, use_langevin_negative_samples=True, train_samples=8, inference_samples=512, stepsize_scheduler=dict(init=0.5, final=1e-05, power=2.0), optimize_again=True, again_stepsize_scheduler=dict(init=1e-05, final=1e-05, power=2.0), device='cpu', noise_scale=0.5, grad_clip=None, delta_action_clip=0.5, add_grad_penalty=True, grad_norm_type='inf', grad_margin=1.0, grad_loss_weight=1.0, **kwargs)
¶
Overview
Initialize the MCMC.
Arguments:
- iters (:obj:int): Number of iterations.
- use_langevin_negative_samples (:obj:bool): Whether to use Langevin sampler.
- train_samples (:obj:int): Number of samples for training.
- inference_samples (:obj:int): Number of samples for inference.
- stepsize_scheduler (:obj:dict): Step size scheduler for Langevin sampler.
- optimize_again (:obj:bool): Whether to run a second optimization.
- again_stepsize_scheduler (:obj:dict): Step size scheduler for the second optimization.
- device (:obj:str): Device.
- noise_scale (:obj:float): Initial noise scale.
- grad_clip (:obj:float): Gradient clip.
- delta_action_clip (:obj:float): Action clip.
- add_grad_penalty (:obj:bool): Whether to add gradient penalty.
- grad_norm_type (:obj:str): Gradient norm type.
- grad_margin (:obj:float): Gradient margin.
- grad_loss_weight (:obj:float): Gradient loss weight.
grad_penalty(obs, action, ebm)
¶
Overview
Calculate gradient penalty.
Arguments:
- obs (:obj:torch.Tensor): Observations.
- action (:obj:torch.Tensor): Actions.
- ebm (:obj:torch.nn.Module): Energy based model.
Returns:
- loss (:obj:torch.Tensor): Gradient penalty.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, N+1, O).
- action (:obj:torch.Tensor): :math:(B, N+1, A).
- ebm (:obj:torch.nn.Module): :math:(B, N+1, O).
- loss (:obj:torch.Tensor): :math:(B, ).
sample(obs, ebm)
¶
Overview
Create tiled observations and sample counter-negatives for InfoNCE loss.
Arguments:
- obs (:obj:torch.Tensor): Observations.
- ebm (:obj:torch.nn.Module): Energy based model.
Returns:
- tiled_obs (:obj:torch.Tensor): Tiled observations.
- action_samples (:obj:torch.Tensor): Action samples.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, O).
- ebm (:obj:torch.nn.Module): :math:(B, N, O).
- tiled_obs (:obj:torch.Tensor): :math:(B, N, O).
- action_samples (:obj:torch.Tensor): :math:(B, N, A).
Examples:
>>> obs = torch.randn(2, 4)
>>> ebm = EBM(4, 5)
>>> opt = MCMC()
>>> opt.set_action_bounds(np.stack([np.zeros(5), np.ones(5)], axis=0))
>>> tiled_obs, action_samples = opt.sample(obs, ebm)
infer(obs, ebm)
¶
Overview
Optimize for the best action conditioned on the current observation.
Arguments:
- obs (:obj:torch.Tensor): Observations.
- ebm (:obj:torch.nn.Module): Energy based model.
Returns:
- best_action_samples (:obj:torch.Tensor): Actions.
Shapes:
- obs (:obj:torch.Tensor): :math:(B, O).
- ebm (:obj:torch.nn.Module): :math:(B, N, O).
- best_action_samples (:obj:torch.Tensor): :math:(B, A).
Examples:
>>> obs = torch.randn(2, 4)
>>> ebm = EBM(4, 5)
>>> opt = MCMC()
>>> opt.set_action_bounds(np.stack([np.zeros(5), np.ones(5)], axis=0))
>>> best_action_samples = opt.infer(obs, ebm)
EBM
¶
Bases: Module
Overview
Energy based model.
Interface:
__init__, forward
__init__(obs_shape, action_shape, hidden_size=512, hidden_layer_num=4, **kwargs)
¶
Overview
Initialize the EBM.
Arguments:
- obs_shape (:obj:int): Observation shape.
- action_shape (:obj:int): Action shape.
- hidden_size (:obj:int): Hidden size.
- hidden_layer_num (:obj:int): Number of hidden layers.
forward(obs, action)
¶
Overview
Forward computation graph of EBM.
Arguments:
- obs (:obj:torch.Tensor): Observation of shape (B, N, O).
- action (:obj:torch.Tensor): Action of shape (B, N, A).
Returns:
- pred (:obj:torch.Tensor): Energy of shape (B, N).
Examples:
>>> obs = torch.randn(2, 3, 4)
>>> action = torch.randn(2, 3, 5)
>>> ebm = EBM(4, 5)
>>> pred = ebm(obs, action)
AutoregressiveEBM
¶
Bases: Module
Overview
Autoregressive energy based model.
Interface:
__init__, forward
__init__(obs_shape, action_shape, hidden_size=512, hidden_layer_num=4)
¶
Overview
Initialize the AutoregressiveEBM.
Arguments:
- obs_shape (:obj:int): Observation shape.
- action_shape (:obj:int): Action shape.
- hidden_size (:obj:int): Hidden size.
- hidden_layer_num (:obj:int): Number of hidden layers.
forward(obs, action)
¶
Overview
Forward computation graph of AutoregressiveEBM.
Arguments:
- obs (:obj:torch.Tensor): Observation of shape (B, N, O).
- action (:obj:torch.Tensor): Action of shape (B, N, A).
Returns:
- pred (:obj:torch.Tensor): Energy of shape (B, N, A).
Examples:
>>> obs = torch.randn(2, 3, 4)
>>> action = torch.randn(2, 3, 5)
>>> arebm = AutoregressiveEBM(4, 5)
>>> pred = arebm(obs, action)
create_stochastic_optimizer(device, stochastic_optimizer_config)
¶
Overview
Create stochastic optimizer.
Arguments:
- device (:obj:str): Device.
- stochastic_optimizer_config (:obj:dict): Stochastic optimizer config.
no_ebm_grad()
¶
Wrapper that disables energy based model gradients
Full Source Code
../ding/model/template/ebm.py