Skip to content

ding.rl_utils.value_rescale

ding.rl_utils.value_rescale

value_transform(x, eps=0.01)

Overview

A function to reduce the scale of the action-value function. :math: h(x) = sign(x)(\sqrt{(abs(x)+1)} - 1) + \epsilon * x .

Arguments: - x: (:obj:torch.Tensor) The input tensor to be normalized. - eps: (:obj:float) The coefficient of the additive regularization term to ensure inverse function is Lipschitz continuous Returns: - (:obj:torch.Tensor) Normalized tensor.

.. note:: Observe and Look Further: Achieving Consistent Performance on Atari (https://arxiv.org/abs/1805.11593).

value_inv_transform(x, eps=0.01)

Overview

The inverse form of value rescale. :math: h^{-1}(x) = sign(x)({( rac{\sqrt{1+4\epsilon(|x|+1+\epsilon)}-1}{2\epsilon})}^2-1) .

Arguments: - x: (:obj:torch.Tensor) The input tensor to be unnormalized. - eps: (:obj:float) The coefficient of the additive regularization term to ensure inverse function is Lipschitz continuous Returns: - (:obj:torch.Tensor) Unnormalized tensor.

symlog(x)

Overview

A function to normalize the targets. :math: symlog(x) = sign(x)(\ln{|x|+1}) .

Arguments: - x: (:obj:torch.Tensor) The input tensor to be normalized. Returns: - (:obj:torch.Tensor) Normalized tensor.

.. note:: Mastering Diverse Domains through World Models (https://arxiv.org/abs/2301.04104)

inv_symlog(x)

Overview

The inverse form of symlog. :math: symexp(x) = sign(x)(\exp{|x|}-1) .

Arguments: - x: (:obj:torch.Tensor) The input tensor to be unnormalized. Returns: - (:obj:torch.Tensor) Unnormalized tensor.

Full Source Code

../ding/rl_utils/value_rescale.py

1import torch 2 3 4def value_transform(x: torch.Tensor, eps: float = 1e-2) -> torch.Tensor: 5 """ 6 Overview: 7 A function to reduce the scale of the action-value function. 8 :math: `h(x) = sign(x)(\sqrt{(abs(x)+1)} - 1) + \epsilon * x` . 9 Arguments: 10 - x: (:obj:`torch.Tensor`) The input tensor to be normalized. 11 - eps: (:obj:`float`) The coefficient of the additive regularization term \ 12 to ensure inverse function is Lipschitz continuous 13 Returns: 14 - (:obj:`torch.Tensor`) Normalized tensor. 15 16 .. note:: 17 Observe and Look Further: Achieving Consistent Performance on Atari (https://arxiv.org/abs/1805.11593). 18 """ 19 return torch.sign(x) * (torch.sqrt(torch.abs(x) + 1) - 1) + eps * x 20 21 22def value_inv_transform(x: torch.Tensor, eps: float = 1e-2) -> torch.Tensor: 23 """ 24 Overview: 25 The inverse form of value rescale. 26 :math: `h^{-1}(x) = sign(x)({(\frac{\sqrt{1+4\epsilon(|x|+1+\epsilon)}-1}{2\epsilon})}^2-1)` . 27 Arguments: 28 - x: (:obj:`torch.Tensor`) The input tensor to be unnormalized. 29 - eps: (:obj:`float`) The coefficient of the additive regularization term \ 30 to ensure inverse function is Lipschitz continuous 31 Returns: 32 - (:obj:`torch.Tensor`) Unnormalized tensor. 33 """ 34 return torch.sign(x) * (((torch.sqrt(1 + 4 * eps * (torch.abs(x) + 1 + eps)) - 1) / (2 * eps)) ** 2 - 1) 35 36 37def symlog(x: torch.Tensor) -> torch.Tensor: 38 """ 39 Overview: 40 A function to normalize the targets. 41 :math: `symlog(x) = sign(x)(\ln{|x|+1})` . 42 Arguments: 43 - x: (:obj:`torch.Tensor`) The input tensor to be normalized. 44 Returns: 45 - (:obj:`torch.Tensor`) Normalized tensor. 46 47 .. note:: 48 Mastering Diverse Domains through World Models (https://arxiv.org/abs/2301.04104) 49 """ 50 return torch.sign(x) * (torch.log(torch.abs(x) + 1)) 51 52 53def inv_symlog(x: torch.Tensor) -> torch.Tensor: 54 """ 55 Overview: 56 The inverse form of symlog. 57 :math: `symexp(x) = sign(x)(\exp{|x|}-1)` . 58 Arguments: 59 - x: (:obj:`torch.Tensor`) The input tensor to be unnormalized. 60 Returns: 61 - (:obj:`torch.Tensor`) Unnormalized tensor. 62 """ 63 return torch.sign(x) * (torch.exp(torch.abs(x)) - 1)