Skip to content

ding.rl_utils.beta_function

ding.rl_utils.beta_function

Referenced papar

cpw(x, eta=0.71)

Overview

The implementation of CPW function.

Arguments: - x (:obj:Union[torch.Tensor, float]): The input value. - eta (:obj:float): The hyperparameter of CPW function. Returns: - output (:obj:Union[torch.Tensor, float]): The output value.

CVaR(x, eta=0.71)

Overview

The implementation of CVaR function, which is a risk-averse function.

Arguments: - x (:obj:Union[torch.Tensor, float]): The input value. - eta (:obj:float): The hyperparameter of CVaR function. Returns: - output (:obj:Union[torch.Tensor, float]): The output value.

Pow(x, eta=0.0)

Overview

The implementation of Pow function, which is risk-averse when eta < 0 and risk-seeking when eta > 0.

Arguments: - x (:obj:Union[torch.Tensor, float]): The input value. - eta (:obj:float): The hyperparameter of Pow function. Returns: - output (:obj:Union[torch.Tensor, float]): The output value.

Full Source Code

../ding/rl_utils/beta_function.py

1""" 2Referenced papar <Implicit Quantile Networks for Distributional Reinforcement Learning> 3""" 4import torch 5from typing import Union 6 7beta_function_map = {} 8 9beta_function_map['uniform'] = lambda x: x 10 11# For beta functions, concavity corresponds to risk-averse and convexity to risk-seeking policies 12 13 14# For CPW, eta = 0.71 most closely match human subjects 15# this function is locally concave for small values of τ and becomes locally convex for larger values of τ 16def cpw(x: Union[torch.Tensor, float], eta: float = 0.71) -> Union[torch.Tensor, float]: 17 """ 18 Overview: 19 The implementation of CPW function. 20 Arguments: 21 - x (:obj:`Union[torch.Tensor, float]`): The input value. 22 - eta (:obj:`float`): The hyperparameter of CPW function. 23 Returns: 24 - output (:obj:`Union[torch.Tensor, float]`): The output value. 25 """ 26 return (x ** eta) / ((x ** eta + (1 - x) ** eta) ** (1 / eta)) 27 28 29beta_function_map['CPW'] = cpw 30 31 32# CVaR is risk-averse 33def CVaR(x: Union[torch.Tensor, float], eta: float = 0.71) -> Union[torch.Tensor, float]: 34 """ 35 Overview: 36 The implementation of CVaR function, which is a risk-averse function. 37 Arguments: 38 - x (:obj:`Union[torch.Tensor, float]`): The input value. 39 - eta (:obj:`float`): The hyperparameter of CVaR function. 40 Returns: 41 - output (:obj:`Union[torch.Tensor, float]`): The output value. 42 """ 43 assert eta <= 1.0 44 return x * eta 45 46 47beta_function_map['CVaR'] = CVaR 48 49 50# risk-averse (eta < 0) or risk-seeking (eta > 0) 51def Pow(x: Union[torch.Tensor, float], eta: float = 0.0) -> Union[torch.Tensor, float]: 52 """ 53 Overview: 54 The implementation of Pow function, which is risk-averse when eta < 0 and risk-seeking when eta > 0. 55 Arguments: 56 - x (:obj:`Union[torch.Tensor, float]`): The input value. 57 - eta (:obj:`float`): The hyperparameter of Pow function. 58 Returns: 59 - output (:obj:`Union[torch.Tensor, float]`): The output value. 60 """ 61 if eta >= 0: 62 return x ** (1 / (1 + eta)) 63 else: 64 return 1 - (1 - x) ** (1 / 1 - eta) 65 66 67beta_function_map['Pow'] = Pow