ding.policy.common_utils¶
ding.policy.common_utils
¶
set_noise_mode(module, noise_enabled)
¶
Overview
Recursively set the 'enable_noise' attribute for all NoiseLinearLayer modules within the given module. This function is typically used in algorithms such as NoisyNet and Rainbow. During training, 'enable_noise' should be set to True to enable noise for exploration. During inference or evaluation, it should be set to False to disable noise for deterministic behavior.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
- module (
|
obj: |
required | |
- noise_enabled (
|
obj: |
required |
default_preprocess_learn(data, use_priority_IS_weight=False, use_priority=False, use_nstep=False, ignore_done=False)
¶
Overview
Default data pre-processing in policy's _forward_learn method, including stacking batch data, preprocess ignore done, nstep and priority IS weight.
Arguments:
- data (:obj:List[Any]): The list of a training batch samples, each sample is a dict of PyTorch Tensor.
- use_priority_IS_weight (:obj:bool): Whether to use priority IS weight correction, if True, this function will set the weight of each sample to the priority IS weight.
- use_priority (:obj:bool): Whether to use priority, if True, this function will set the priority IS weight.
- use_nstep (:obj:bool): Whether to use nstep TD error, if True, this function will reshape the reward.
- ignore_done (:obj:bool): Whether to ignore done, if True, this function will set the done to 0.
Returns:
- data (:obj:Dict[str, torch.Tensor]): The preprocessed dict data whose values can be directly used for the following model forward and loss computation.
single_env_forward_wrapper(forward_fn)
¶
Overview
Wrap policy to support gym-style interaction between policy and single environment.
Arguments:
- forward_fn (:obj:Callable): The original forward function of policy.
Returns:
- wrapped_forward_fn (:obj:Callable): The wrapped forward function of policy.
Examples:
>>> env = gym.make('CartPole-v0')
>>> policy = DQNPolicy(...)
>>> forward_fn = single_env_forward_wrapper(policy.eval_mode.forward)
>>> obs = env.reset()
>>> action = forward_fn(obs)
>>> next_obs, rew, done, info = env.step(action)
single_env_forward_wrapper_ttorch(forward_fn, cuda=True)
¶
Overview
Wrap policy to support gym-style interaction between policy and single environment for treetensor (ttorch) data.
Arguments:
- forward_fn (:obj:Callable): The original forward function of policy.
- cuda (:obj:bool): Whether to use cuda in policy, if True, this function will move the input data to cuda.
Returns:
- wrapped_forward_fn (:obj:Callable): The wrapped forward function of policy.
Examples:
>>> env = gym.make('CartPole-v0')
>>> policy = PPOFPolicy(...)
>>> forward_fn = single_env_forward_wrapper_ttorch(policy.eval)
>>> obs = env.reset()
>>> action = forward_fn(obs)
>>> next_obs, rew, done, info = env.step(action)
Full Source Code
../ding/policy/common_utils.py