From 5716fbc4ed33779d765020c4408587ed5d50ad06 Mon Sep 17 00:00:00 2001 From: Steve Bronder Date: Wed, 11 Feb 2026 15:01:39 -0500 Subject: [PATCH] update options for laplace --- src/functions-reference/embedded_laplace.qmd | 58 ++++++++++++-------- src/reference-manual/laplace.qmd | 4 +- 2 files changed, 37 insertions(+), 25 deletions(-) diff --git a/src/functions-reference/embedded_laplace.qmd b/src/functions-reference/embedded_laplace.qmd index 63346dbfb..6adb73751 100644 --- a/src/functions-reference/embedded_laplace.qmd +++ b/src/functions-reference/embedded_laplace.qmd @@ -4,23 +4,20 @@ pagetitle: Embedded Laplace Approximation # Embedded Laplace Approximation -The embedded Laplace approximation can be used to approximate certain -marginal and conditional distributions that arise in latent Gaussian models. -A latent Gaussian model observes the following hierarchical structure: +The embedded Laplace approximation can be used to approximate certain marginal and conditional distributions that arise in latent Gaussian models. +Embedded Laplace replaces explicit sampling of high-dimensional Gaussian latent variables with a local Gaussian approximation at their conditional posterior mode, producing an approximation to the marginal likelihood. +Embedded Laplace allows a sampler to search a lower-dimensional marginal posterior over non-latent parameters instead of jointly sampling all latent effects. +The embedded Laplace approximation in Stan is best suited for latent Gaussian models when full joint sampling is expensive and the latent conditional posterior is reasonably close to Gaussian. + +For observed data $y$, latent Gaussian variables $\theta$, and hyperparameters $\phi$, a latent Gaussian model observes the following hierarchical structure: \begin{eqnarray} \phi &\sim& p(\phi), \\ \theta &\sim& \text{MultiNormal}(0, K(\phi)), \\ y &\sim& p(y \mid \theta, \phi). \end{eqnarray} -In this formulation, $y$ represents the -observed data, and $p(y \mid \theta, \phi)$ is the likelihood function that -specifies how observations are generated conditional on the latent Gaussian -variables $\theta$ and hyperparameters $\phi$. -$K(\phi)$ denotes the prior covariance matrix for the latent Gaussian variables -$\theta$ and is parameterized by $\phi$. -The prior $p(\theta \mid \phi)$ is restricted to be a multivariate normal -centered at 0. That said, we can always pick a likelihood that offsets $\theta$, -which is equivalently to specifying a prior mean. +In this formulation, $p(y \mid \theta, \phi)$ is the likelihood function that +specifies how observations are generated conditional on the $\theta$ and $\phi$. +$K(\phi)$ denotes the prior covariance matrix for the latent Gaussian variables $\theta$ and is parameterized by $\phi$. To sample from the joint posterior $p(\phi, \theta \mid y)$, we can either use a standard method, such as Markov chain Monte Carlo, or we can follow @@ -83,12 +80,14 @@ The signature of the function is: Which returns an approximation to the log marginal likelihood $p(y \mid \phi)$. {{< since 2.37 >}} -This function takes in the following arguments. +The embedded Laplace functions accept two functors whose user defined arguments are passed in as tuples to `laplace_marginal`. -1. `likelihood_function` - user-specified log likelihood whose first argument is the vector of latent Gaussian variables `theta` -2. `likelihood_arguments` - A tuple of the log likelihood arguments whose internal members will be passed to the covariance function -3. `covariance_function` - Prior covariance function -4. `covariance_arguments` A tuple of the arguments whose internal members will be passed to the the covariance function +1. `likelihood_function` - user-specified log likelihood whose first argument is the vector of latent Gaussian variables `theta` with other arguments are user defined. + - `real likelihood_function(vector theta, likelihood_arguments_1, likelihood_arguments_2, ...)` +2. `likelihood_arguments` - A tuple of the log likelihood arguments whose internal members will be passed to the covariance function. +3. `covariance_function` - Prior covariance function. + - `matrix covariance_function(covariance_argument_1, covariance_argument_2, ...)` +4. `covariance_arguments` A tuple of the arguments whose internal members will be passed to the the covariance function. Below we go over each argument in more detail. @@ -198,12 +197,13 @@ It also possible to specify control parameters, which can help improve the optimization that underlies the Laplace approximation, using `laplace_marginal_tol` with the following signature: -\index{{\tt \bfseries laplace\_marginal\_tol }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init, real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): real}|hyperpage} - - -\index{{\tt \bfseries laplace\_marginal\_tol }!{\tt (function likelihood\_function, tuple(...), function covariance\_function, tuple(...), vector theta\_init, real tol, int max\_steps, int hessian\_block\_size, int solver, int max\_steps\_linesearch): real}|hyperpage} - -`real` **`laplace_marginal_tol`**`(function likelihood_function, tuple(...), function covariance_function, tuple(...), vector theta_init, real tol, int max_steps, int hessian_block_size, int solver, int max_steps_linesearch)`
\newline +```stan +real laplace_marginal_tol(function likelihood_function, tuple(...), + function covariance_function, tuple(...), + tuple(vector theta_init, real tol, int max_steps, + int hessian_block_size, int solver, + int max_steps_linesearch, int allow_fallback)) +``` Returns an approximation to the log marginal likelihood $p(y \mid \phi)$ and allows the user to tune the control parameters of the approximation. @@ -244,6 +244,18 @@ the step is repeatedly halved until the objective function decreases or the maximum number of steps in the linesearch is reached. By default, `max_steps_linesearch=0`, meaning no linesearch is performed. +* `allow_fallback`: If user set solver fails, this flag determines whether to fallback to the next solver. For example, if the user specifies `solver=1` but the Cholesky decomposition fails, the optimizer will try `solver=2` instead. + +The embedded Laplace approximation's options have a helper callable `generate_laplace_options(int theta_size)` that will generate the tuple for you. This can be useful for quickly setting up the control parameters in the `transformed data` block to reuse within the model. + +```stan +tuple(vector[theta_size], real, int, int, int, int, int) laplace_ops = generate_laplace_options(theta_size); +// Modify solver type +laplace_ops.5 = 2; +// Turn off fallthrough +laplace_ops.7 = 0; +``` + {{< since 2.37 >}} ## Sample from the approximate conditional $\hat{p}(\theta \mid y, \phi)$ diff --git a/src/reference-manual/laplace.qmd b/src/reference-manual/laplace.qmd index 092b8c479..06ac96432 100644 --- a/src/reference-manual/laplace.qmd +++ b/src/reference-manual/laplace.qmd @@ -14,14 +14,14 @@ to the constrained space before outputting them. Given the estimate of the mode $\widehat{\theta}$, the Hessian $H(\widehat{\theta})$ is computed using -central finite differences of the model functor. +central finite differences of the model functor. Next the algorithm computes the Cholesky factor of the negative inverse Hessian: $R^{-1} = \textrm{chol}(-H(\widehat{\theta})) \backslash \mathbf{1}$. Each draw is generated on the unconstrained scale by sampling -$\theta^{\textrm{std}(m)} \sim \textrm{normal}(0, \textrm{I})$ +$\theta^{\textrm{std}(m)} \sim \textrm{normal}(0, \textrm{I})$ and defining draw $m$ to be