`ding.torch_utils.diffusion_SDE.dpm_solver_pytorch`¶

`ding.torch_utils.diffusion_SDE.dpm_solver_pytorch` ¶

`NoiseScheduleVP` ¶

`init(schedule='discrete', betas=None, alphas_cumprod=None, continuous_beta_0=0.1, continuous_beta_1=20.0)` ¶

Create a wrapper class for the forward SDE (VP type).

Update: We support discrete-time diffusion models by implementing a picewise linear interpolation for log_alpha_t. We recommend to use schedule='discrete' for the discrete-time diffusion models, especially for high-resolution images.

The forward SDE ensures that the condition distribution q_{t|0}(x_t | x_0) = N ( alpha_t * x_0, sigma_t^2 * I ). We further define lambda_t = log(alpha_t) - log(sigma_t), which is the half-logSNR (described in the DPM-Solver paper). Therefore, we implement the functions for computing alpha_t, sigma_t and lambda_t. For t in [0, T], we have: log_alpha_t = self.marginal_log_mean_coeff(t) sigma_t = self.marginal_std(t) lambda_t = self.marginal_lambda(t)

Moreover, as lambda(t) is an invertible function, we also support its inverse function:

t = self.inverse_lambda(lambda_t)

===============================================================

We support both discrete-time DPMs (trained on n = 0, 1, ..., N-1) and continuous-time DPMs (trained on t in [t_0, T]).

For discrete-time DPMs:

For discrete-time DPMs trained on n = 0, 1, ..., N-1, we convert the discrete steps to continuous time steps by: t_i = (i + 1) / N e.g. for N = 1000, we have t_0 = 1e-3 and T = t_{N-1} = 1. We solve the corresponding diffusion ODE from time T = 1 to time t_0 = 1e-3.

Args: betas: A torch.Tensor. The beta array for the discrete-time DPM. (See the original DDPM paper for details) alphas_cumprod: A torch.Tensor. The cumprod alphas for the discrete-time DPM. (See the original DDPM paper for details)

Note that we always have alphas_cumprod = cumprod(betas). Therefore, we only need to set one of betas and alphas_cumprod.

Important: Please pay special attention for the args for alphas_cumprod: The alphas_cumprod is the \hat{alpha_n} arrays in the notations of DDPM. Specifically, DDPMs assume that q_{t_n | 0}(x_{t_n} | x_0) = N ( \sqrt{\hat{alpha_n}} * x_0, (1 - \hat{alpha_n}) * I ). Therefore, the notation \hat{alpha_n} is different from the notation alpha_t in DPM-Solver. In fact, we have alpha_{t_n} = \sqrt{\hat{alpha_n}}, and log(alpha_{t_n}) = 0.5 * log(\hat{alpha_n}).
For continuous-time DPMs:

We support two types of VPSDEs: linear (DDPM) and cosine (improved-DDPM). The hyperparameters for the noise schedule are the default settings in DDPM and improved-DDPM:

Args: beta_min: A float number. The smallest beta for the linear schedule. beta_max: A float number. The largest beta for the linear schedule. cosine_s: A float number. The hyperparameter in the cosine schedule. cosine_beta_max: A float number. The hyperparameter in the cosine schedule. T: A float number. The ending time of the forward process.

===============================================================

Parameters:

Name	Type	Description	Default
`schedule`		A `str`. The noise schedule of the forward SDE. 'discrete' for discrete-time DPMs, 'linear' or 'cosine' for continuous-time DPMs.	`'discrete'`

Returns: A wrapper object of the forward SDE (VP type).

===============================================================

Example:

For discrete-time DPMs, given betas (the beta array for n = 0, 1, ..., N - 1):¶

ns = NoiseScheduleVP('discrete', betas=betas)

For discrete-time DPMs, given alphas_cumprod (the \hat{alpha_n} array for n = 0, 1, ..., N - 1):¶

ns = NoiseScheduleVP('discrete', alphas_cumprod=alphas_cumprod)

For continuous-time DPMs (VPSDE), linear schedule:¶

ns = NoiseScheduleVP('linear', continuous_beta_0=0.1, continuous_beta_1=20.)

`marginal_log_mean_coeff(t)` ¶

Compute log(alpha_t) of a given continuous-time label t in [0, T].

`marginal_alpha(t)` ¶

Compute alpha_t of a given continuous-time label t in [0, T].

`marginal_std(t)` ¶

Compute sigma_t of a given continuous-time label t in [0, T].

`marginal_lambda(t)` ¶

Compute lambda_t = log(alpha_t) - log(sigma_t) of a given continuous-time label t in [0, T].

`inverse_lambda(lamb)` ¶

Compute the continuous-time label t in [0, T] of a given half-logSNR lambda_t.

`DPM_Solver` ¶

`init(model_fn, noise_schedule, predict_x0=False, thresholding=False, max_val=1.0)` ¶

Construct a DPM-Solver.

We support both the noise prediction model ("predicting epsilon") and the data prediction model ("predicting x0"). If predict_x0 is False, we use the solver for the noise prediction model (DPM-Solver). If predict_x0 is True, we use the solver for the data prediction model (DPM-Solver++). In such case, we further support the "dynamic thresholding" in [1] when thresholding is True. The "dynamic thresholding" can greatly improve the sample quality for pixel-space DPMs with large guidance scales.

Parameters:

Name	Description	Default
`model_fn`	A noise prediction model function which accepts the continuous-time input (t in [epsilon, T]): `def model_fn(x, t_continuous): return noise`	required
`noise_schedule`	A noise schedule object, such as NoiseScheduleVP.	required
`predict_x0`	A `bool`. If true, use the data prediction model; else, use the noise prediction model.	`False`
`thresholding`	A `bool`. Valid when `predict_x0` is True. Whether to use the "dynamic thresholding" in [1].	`False`
`max_val`	A `float`. Valid when both `predict_x0` and `thresholding` are True. The max value for thresholding.	`1.0`

[1] Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022b.

`noise_prediction_fn(x, t)` ¶

Return the noise prediction model.

`data_prediction_fn(x, t)` ¶

Return the data prediction model (with thresholding).

`model_fn(x, t)` ¶

Convert the model to the noise prediction model or the data prediction model.

`get_time_steps(skip_type, t_T, t_0, N, device)` ¶

Compute the intermediate time steps for sampling.

Parameters:

Name	Description	Default
`skip_type`	A `str`. The type for the spacing of the time steps. We support three types: - 'logSNR': uniform logSNR for the time steps. - 'time_uniform': uniform time for the time steps. (Recommended for high-resolutional data.) - 'time_quadratic': quadratic time for the time steps. (Used in DDIM for low-resolutional data.)	required
`t_T`	A `float`. The starting time of the sampling (default is T).	required
`t_0`	A `float`. The ending time of the sampling (default is epsilon).	required
`N`	A `int`. The total number of the spacing of the time steps.	required
`device`	A torch device.	required

Returns: A pytorch tensor of the time steps, with the shape (N + 1,).

`get_orders_for_singlestep_solver(steps, order)` ¶

Get the order of each step for sampling by the singlestep DPM-Solver.

We combine both DPM-Solver-1,2,3 to use all the function evaluations, which is named as "DPM-Solver-fast". Given a fixed number of function evaluations by steps, the sampling procedure by DPM-Solver-fast is: - If order == 1: We take steps of DPM-Solver-1 (i.e. DDIM). - If order == 2: - Denote K = (steps // 2). We take K or (K + 1) intermediate time steps for sampling. - If steps % 2 == 0, we use K steps of DPM-Solver-2. - If steps % 2 == 1, we use K steps of DPM-Solver-2 and 1 step of DPM-Solver-1. - If order == 3: - Denote K = (steps // 3 + 1). We take K intermediate time steps for sampling. - If steps % 3 == 0, we use (K - 2) steps of DPM-Solver-3, and 1 step of DPM-Solver-2 and 1 step of DPM-Solver-1. - If steps % 3 == 1, we use (K - 1) steps of DPM-Solver-3 and 1 step of DPM-Solver-1. - If steps % 3 == 2, we use (K - 1) steps of DPM-Solver-3 and 1 step of DPM-Solver-2.

============================================ Args: order: A int. The max order for the solver (2 or 3). steps: A int. The total number of function evaluations (NFE). Returns: orders: A list of the solver order of each step.

`denoise_fn(x, s)` ¶

Denoise at the final step, which is equivalent to solve the ODE from lambda_s to infty by first-order discretization.