Eisenstat-Walker Newton-Krylov solver (rebased) (#754)

ChrisRackauckas-Claude · termi-official · ChrisRackauckas · web-flow · commit 3b7864eb01a8 · 2025-12-18T20:10:43.000-05:00
* Eisenstat-Walker Newton-Krylov solver * Missing default * Fix typos * Add missing dispatch * Add verbosity * Add maximum eta to params. * More parameters * Lower initial eta * Derp * Derp again * Debug instability * Verbosity * Export forcing * Bump LinearSolve to reflect that the new API is used * Remove export of helper constructor * Docstring for forcing alg * Fix CI * Fix some corner cases * Move export to the right place * Add documentation for EisenstatWalkerForcing2 - Add Eisenstat & Walker (1996) reference to refs.bib - Add forcing parameter to General Keyword Arguments section - Add new Forcing Term Strategies section with EisenstatWalkerForcing2 docs - Update NewtonRaphson docstring with detailed keyword argument descriptions - Add mention of Newton-Krylov support in solver recommendations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update verbosity.jl --------- Co-authored-by: termi-official <termi-official@users.noreply.github.com> Co-authored-by: termi-official <9196588+termi-official@users.noreply.github.com> Co-authored-by: Christopher Rackauckas <accounts@chrisrackauckas.com> Co-authored-by: ChrisRackauckas <me@chrisrackauckas.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
diff --git a/Project.toml b/Project.toml
@@ -25,6 +25,7 @@ Preferences = "21216c6a-2e73-6563-6e65-726566657250"
 ReTestItems = "817f1d60-ba6b-4fd5-9520-3cf149f6a823"
 Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
 SciMLBase = "0bca4576-84f4-4d90-8ffe-ffa030f20462"
+SciMLLogging = "a6db7da4-7206-11f0-1eab-35f2a5dbe1d1"
 SimpleNonlinearSolve = "727e6d20-b764-4bd8-a329-72de5adea6c7"
 StaticArraysCore = "1e83bf80-4336-4d27-bf5d-d5a4f845583c"
 SymbolicIndexingInterface = "2efcf032-c050-4f8e-a9bb-153293bab1f5"
@@ -86,7 +87,7 @@ LeastSquaresOptim = "0.8.5"
 LineSearch = "0.1.4"
 LineSearches = "7.3"
 LinearAlgebra = "1.10"
-LinearSolve = "3.46"
+LinearSolve = "3.48"
 MINPACK = "1.2"
 MPI = "0.20.22"
 NLSolvers = "0.5"
diff --git a/docs/src/native/solvers.md b/docs/src/native/solvers.md
@@ -35,6 +35,10 @@ documentation.
     differencing tricks (without ever constructing the Jacobian). However, if the Jacobian
     is still needed, for example for a preconditioner, `concrete_jac = true` can be passed
     in order to force the construction of the Jacobian.
+  - `forcing`: Adaptive forcing term strategy for Newton-Krylov methods. When using an
+    iterative linear solver (Krylov method), this controls how accurately the linear system
+    is solved at each Newton iteration. Defaults to `nothing` (fixed tolerance). See
+    [Forcing Term Strategies](@ref forcing_strategies) for available options.
 
 ## Nonlinear Solvers
 
@@ -81,3 +85,43 @@ QuasiNewtonAlgorithm
 GeneralizedFirstOrderAlgorithm
 GeneralizedDFSane
 ```
+
+## [Forcing Term Strategies](@id forcing_strategies)
+
+Forcing term strategies control how accurately the linear system is solved at each Newton
+iteration when using iterative (Krylov) linear solvers. This is the key idea behind
+Newton-Krylov methods: instead of solving ``J \delta u = -f(u)`` exactly, we solve it only
+approximately with a tolerance ``\eta_k`` (the forcing term).
+
+The [eisenstat1996choosing](@citet) paper introduced adaptive strategies for choosing
+``\eta_k`` that can significantly improve convergence, especially for problems where the
+initial guess is far from the solution.
+
+```@docs
+EisenstatWalkerForcing2
+```
+
+### Example Usage
+
+```julia
+using NonlinearSolve, LinearSolve
+
+# Define a large nonlinear problem
+function f!(F, u, p)
+    for i in 2:(length(u) - 1)
+        F[i] = u[i - 1] - 2u[i] + u[i + 1] + sin(u[i])
+    end
+    F[1] = u[1] - 1.0
+    F[end] = u[end]
+end
+
+n = 1000
+u0 = zeros(n)
+prob = NonlinearProblem(f!, u0)
+
+# Use Newton-Raphson with GMRES and Eisenstat-Walker forcing
+sol = solve(prob, NewtonRaphson(
+    linsolve = KrylovJL_GMRES(),
+    forcing = EisenstatWalkerForcing2()
+))
+```
diff --git a/docs/src/refs.bib b/docs/src/refs.bib
@@ -1,3 +1,14 @@
+@article{eisenstat1996choosing,
+  title     = {Choosing the forcing terms in an inexact Newton method},
+  author    = {Eisenstat, Stanley C and Walker, Homer F},
+  journal   = {SIAM Journal on Scientific Computing},
+  volume    = {17},
+  number    = {1},
+  pages     = {16--32},
+  year      = {1996},
+  publisher = {SIAM}
+}
+
 @article{bastin2010retrospective,
   title     = {A retrospective trust-region method for unconstrained optimization},
   author    = {Bastin, Fabian and Malmedy, Vincent and Mouffe, M{\'e}lodie and Toint, Philippe L and Tomanos, Dimitri},
diff --git a/docs/src/solvers/nonlinear_system_solvers.md b/docs/src/solvers/nonlinear_system_solvers.md
@@ -50,7 +50,8 @@ sparse/structured matrix support, etc. These methods support the largest set of
 features, but have a bit of overhead on very small problems.
 
   - [`NewtonRaphson()`](@ref): A Newton-Raphson method with swappable nonlinear solvers and
-    autodiff methods for high performance on large and sparse systems.
+    autodiff methods for high performance on large and sparse systems. Supports Newton-Krylov
+    methods with adaptive forcing via [`EisenstatWalkerForcing2`](@ref).
   - [`TrustRegion()`](@ref): A Newton Trust Region dogleg method with swappable nonlinear
     solvers and autodiff methods for high performance on large and sparse systems.
   - [`LevenbergMarquardt()`](@ref): An advanced Levenberg-Marquardt implementation with the
diff --git a/lib/NonlinearSolveBase/ext/NonlinearSolveBaseLinearSolveExt.jl b/lib/NonlinearSolveBase/ext/NonlinearSolveBaseLinearSolveExt.jl
@@ -77,4 +77,8 @@ function set_lincache_A!(lincache, new_A)
     return
 end
 
+function LinearSolve.update_tolerances!(cache::LinearSolveJLCache; kwargs...)
+    LinearSolve.update_tolerances!(cache.lincache; kwargs...)
+end
+
 end
diff --git a/lib/NonlinearSolveBase/src/verbosity.jl b/lib/NonlinearSolveBase/src/verbosity.jl
@@ -14,6 +14,7 @@ diagnostic messages, warnings, and errors during nonlinear system solution.
 
 ## Numerical Group
 - `threshold_state`: Messages about threshold state in low-rank methods
+- `forcing`: Messages about forcing parameter in Newton-Krylov methods
 
 ## Sensitivity Group
 - `sensitivity_vjp_choice`: Messages about VJP choice in sensitivity analysis (used by SciMLSensitivity.jl)
@@ -65,7 +66,7 @@ NonlinearVerbosity
 
 @verbosity_specifier NonlinearVerbosity begin
     toggles = (:linear_verbosity, :non_enclosing_interval, :alias_u0_immutable,
-        :linsolve_failed_noncurrent, :termination_condition, :threshold_state,
+        :linsolve_failed_noncurrent, :termination_condition, :threshold_state, :forcing,
         :sensitivity_vjp_choice)
 
     presets = (
@@ -76,6 +77,7 @@ NonlinearVerbosity
             linsolve_failed_noncurrent = Silent(),
             termination_condition = Silent(),
             threshold_state = Silent(),
+            forcing = Silent(),
             sensitivity_vjp_choice = Silent()
         ),
         Minimal = (
@@ -85,6 +87,7 @@ NonlinearVerbosity
             linsolve_failed_noncurrent = WarnLevel(),
             termination_condition = Silent(),
             threshold_state = Silent(),
+            forcing = Silent(),
             sensitivity_vjp_choice = Silent()
         ),
         Standard = (
@@ -94,6 +97,7 @@ NonlinearVerbosity
             linsolve_failed_noncurrent = WarnLevel(),
             termination_condition = WarnLevel(),
             threshold_state = WarnLevel(),
+            forcing = InfoLevel(),
             sensitivity_vjp_choice = WarnLevel()
         ),
         Detailed = (
@@ -103,6 +107,7 @@ NonlinearVerbosity
             linsolve_failed_noncurrent = WarnLevel(),
             termination_condition = WarnLevel(),
             threshold_state = WarnLevel(),
+            forcing = InfoLevel(),
             sensitivity_vjp_choice = WarnLevel()
         ),
         All = (
@@ -112,14 +117,15 @@ NonlinearVerbosity
             linsolve_failed_noncurrent = WarnLevel(),
             termination_condition = WarnLevel(),
             threshold_state = InfoLevel(),
+            forcing = InfoLevel(),
             sensitivity_vjp_choice = WarnLevel()
         )
     )
 
     groups = (
         error_control = (:non_enclosing_interval, :alias_u0_immutable,
             :linsolve_failed_noncurrent, :termination_condition),
-        numerical = (:threshold_state,),
+        numerical = (:threshold_state, :forcing),
         sensitivity = (:sensitivity_vjp_choice,)
     )
 end
diff --git a/lib/NonlinearSolveFirstOrder/src/NonlinearSolveFirstOrder.jl b/lib/NonlinearSolveFirstOrder/src/NonlinearSolveFirstOrder.jl
@@ -33,6 +33,7 @@ using ForwardDiff: ForwardDiff, Dual  # Default Forward Mode AD
 
 include("solve.jl")
 include("raphson.jl")
+include("eisenstat_walker.jl")
 include("gauss_newton.jl")
 include("levenberg_marquardt.jl")
 include("trust_region.jl")
@@ -101,6 +102,8 @@ end
 export NewtonRaphson, PseudoTransient
 export GaussNewton, LevenbergMarquardt, TrustRegion
 
+export EisenstatWalkerForcing2
+
 export RadiusUpdateSchemes
 
 export GeneralizedFirstOrderAlgorithm
diff --git a/lib/NonlinearSolveFirstOrder/src/eisenstat_walker.jl b/lib/NonlinearSolveFirstOrder/src/eisenstat_walker.jl
@@ -0,0 +1,111 @@
+"""
+    EisenstatWalkerForcing2(; η₀ = 0.5, ηₘₐₓ = 0.9, γ = 0.9, α = 2, safeguard = true, safeguard_threshold = 0.1)
+
+Algorithm 2 from the classical work by Eisenstat and Walker (1996) as described by formula (2.6):
+    ηₖ = γ * (||rₖ|| / ||rₖ₋₁||)^α
+
+Here the variables denote:
+    rₖ residual at iteration k
+    η₀   ∈ [0,1) initial value for η
+    ηₘₐₓ ∈ [0,1) maximum value for η
+    γ    ∈ [0,1) correction factor
+    α    ∈ [1,2) correction exponent
+
+Furthermore, the proposed safeguard is implemented:
+    ηₖ = max(ηₖ, γ*ηₖ₋₁^α) if γ*ηₖ₋₁^α > safeguard_threshold
+to prevent ηₖ from shrinking too fast.
+"""
+@concrete struct EisenstatWalkerForcing2
+    η₀
+    ηₘₐₓ
+    γ
+    α
+    safeguard
+    safeguard_threshold
+end
+
+function EisenstatWalkerForcing2(; η₀ = 0.5, ηₘₐₓ = 0.9, γ = 0.9, α = 2, safeguard = true, safeguard_threshold = 0.1)
+    EisenstatWalkerForcing2(η₀, ηₘₐₓ, γ, α, safeguard, safeguard_threshold)
+end
+
+
+@concrete mutable struct EisenstatWalkerForcing2Cache
+    p::EisenstatWalkerForcing2
+    η
+    rnorm
+    rnorm_prev
+    internalnorm
+    verbosity
+end
+
+
+
+function pre_step_forcing!(cache::EisenstatWalkerForcing2Cache, descend_cache::NonlinearSolveBase.NewtonDescentCache, J, u, fu, iter)
+    @SciMLMessage("Eisenstat-Walker forcing residual norm $(cache.rnorm) with rate estimate $(cache.rnorm / cache.rnorm_prev).", cache.verbosity, :forcing)
+
+    # On the first iteration we initialize η with the default initial value and stop.
+    if iter == 0
+        cache.η = cache.p.η₀
+        @SciMLMessage("Eisenstat-Walker initial iteration to η=$(cache.η).", cache.verbosity, :forcing)
+        LinearSolve.update_tolerances!(descend_cache.lincache; reltol=cache.η)
+        return nothing
+    end
+
+    # Store previous
+    ηprev = cache.η
+
+    # Formula (2.6)
+    # ||r|| > 0 should be guaranteed by the convergence criterion
+    (; rnorm, rnorm_prev) = cache
+    (; α, γ) = cache.p
+    cache.η = γ * (rnorm / rnorm_prev)^α
+
+    # Safeguard 2 to prevent over-solving
+    if cache.p.safeguard
+        ηsg = γ*ηprev^α
+        if ηsg > cache.p.safeguard_threshold && ηsg > cache.η
+            cache.η = ηsg
+        end
+    end
+
+    # Far away from the root we also need to respect η ∈ [0,1)
+    cache.η = clamp(cache.η, 0.0, cache.p.ηₘₐₓ)
+
+    @SciMLMessage("Eisenstat-Walker iter $iter update to η=$(cache.η).", cache.verbosity, :forcing)
+
+    # Communicate new relative tolerance to linear solve
+    LinearSolve.update_tolerances!(descend_cache.lincache; reltol=cache.η)
+
+    return nothing
+end
+
+
+
+function post_step_forcing!(cache::EisenstatWalkerForcing2Cache, J, u, fu, δu, iter)
+    # Cache previous residual norm
+    cache.rnorm_prev = cache.rnorm
+    cache.rnorm = cache.internalnorm(fu)
+
+    # @SciMLMessage("Eisenstat-Walker sanity check: $(cache.internalnorm(fu + J*δu)) ≤ $(cache.η * cache.internalnorm(fu)).", cache.verbosity, :linear_verbosity)
+end
+
+
+
+function InternalAPI.init(
+        prob::AbstractNonlinearProblem, alg::EisenstatWalkerForcing2, f, fu, u, p,
+        args...; verbose, internalnorm::F = L2_NORM, kwargs...
+) where {F}
+    fu_norm = internalnorm(fu)
+
+    return EisenstatWalkerForcing2Cache(
+        alg, alg.η₀, fu_norm, fu_norm, internalnorm, verbose
+    )
+end
+
+
+
+function InternalAPI.reinit!(
+        cache::EisenstatWalkerForcing2Cache; p = cache.p, kwargs...
+)
+    cache.p = p
+end
diff --git a/lib/NonlinearSolveFirstOrder/src/raphson.jl b/lib/NonlinearSolveFirstOrder/src/raphson.jl
@@ -1,22 +1,43 @@
 """
     NewtonRaphson(;
         concrete_jac = nothing, linsolve = nothing, linesearch = missing,
-        autodiff = nothing, vjp_autodiff = nothing, jvp_autodiff = nothing
+        autodiff = nothing, vjp_autodiff = nothing, jvp_autodiff = nothing,
+        forcing = nothing,
     )
 
 An advanced NewtonRaphson implementation with support for efficient handling of sparse
 matrices via colored automatic differentiation and preconditioned linear solvers. Designed
 for large-scale and numerically-difficult nonlinear systems.
+
+### Keyword Arguments
+
+  - `autodiff`: determines the backend used for the Jacobian. Defaults to `nothing` which
+    means that a default is selected according to the problem specification.
+  - `concrete_jac`: whether to build a concrete Jacobian. If a Krylov-subspace method is
+    used, then the Jacobian will not be constructed. Defaults to `nothing`.
+  - `linsolve`: the [LinearSolve.jl](https://github.com/SciML/LinearSolve.jl) solver used
+    for the linear solves within the Newton method. Defaults to `nothing`, which means it
+    uses the LinearSolve.jl default algorithm choice.
+  - `linesearch`: the line search algorithm to use. Defaults to `missing` (no line search).
+  - `vjp_autodiff`: backend for computing Vector-Jacobian products.
+  - `jvp_autodiff`: backend for computing Jacobian-Vector products.
+  - `forcing`: Adaptive forcing term strategy for Newton-Krylov methods. When using an
+    iterative linear solver (e.g., `KrylovJL_GMRES()`), this controls how accurately the
+    linear system is solved at each iteration. Use `EisenstatWalkerForcing2()` for the
+    classical Eisenstat-Walker adaptive forcing strategy. Defaults to `nothing` (fixed
+    tolerance from the termination condition).
 """
 function NewtonRaphson(;
         concrete_jac = nothing, linsolve = nothing, linesearch = missing,
-        autodiff = nothing, vjp_autodiff = nothing, jvp_autodiff = nothing
+        autodiff = nothing, vjp_autodiff = nothing, jvp_autodiff = nothing,
+        forcing = nothing,
 )
     return GeneralizedFirstOrderAlgorithm(;
         linesearch,
         descent = NewtonDescent(; linsolve),
         autodiff, vjp_autodiff, jvp_autodiff,
         concrete_jac,
+        forcing,
         name = :NewtonRaphson
     )
 end
diff --git a/lib/NonlinearSolveFirstOrder/src/solve.jl b/lib/NonlinearSolveFirstOrder/src/solve.jl
diff --git a/lib/NonlinearSolveFirstOrder/src/trust_region.jl b/lib/NonlinearSolveFirstOrder/src/trust_region.jl
diff --git a/lib/NonlinearSolveFirstOrder/test/rootfind_tests.jl b/lib/NonlinearSolveFirstOrder/test/rootfind_tests.jl