Skip to content

Comments

engine: implement strongest-path loop discovery algorithm#249

Merged
bpowers merged 5 commits intomainfrom
bobby/finding-loops-that-matter
Feb 15, 2026
Merged

engine: implement strongest-path loop discovery algorithm#249
bpowers merged 5 commits intomainfrom
bobby/finding-loops-that-matter

Conversation

@bpowers
Copy link
Owner

@bpowers bpowers commented Feb 14, 2026

Summary

  • Adds a design document describing the "loops that matter" analysis algorithm for system dynamics models
  • Implements the strongest-path loop discovery algorithm that identifies which feedback loops most influence each stock variable at each timestep
  • Adds comprehensive test suite with multi-model test cases including arms race and decoupled stock scenarios

Test plan

  • Unit tests for core algorithm components (link weight calculation, strongest path finding)
  • Integration tests with arms race model (3-party competitive dynamics)
  • Integration tests with decoupled stocks model (independent feedback loops)
  • Existing engine tests continue to pass

Implement the "Finding the Loops That Matter" strongest-path heuristic
from Eberlein & Schoenberg (2020) Appendix I. This provides an
alternative to exhaustive loop enumeration that scales to large models
by using link score magnitudes to guide a DFS search.

New module ltm_finding.rs contains the core algorithm: for each timestep,
starting from each stock, follow the strongest outbound link scores to
discover the most influential feedback loops. Uses strict less-than
pruning with best_score persistence across stock iterations within a
timestep, and deduplicates loops by sorted node set.

Integration changes: CausalGraph.all_links() returns every causal edge,
generate_ltm_variables_all_links() instruments all links (not just loop
links), and Project.with_ltm_all_links() wires it together. This
"discovery mode" generates link scores for all edges so the strongest-path
search can operate over the full causal structure.

Test models: three-party arms race (7 loops: 3 self-balancing, 3
pairwise reinforcing, 1 three-way reinforcing with reverse deduplicated)
and decoupled stocks (time-varying IF-THEN-ELSE loop activation).
Cross-validation confirms discovery finds all loops in small models.
@bpowers bpowers force-pushed the bobby/finding-loops-that-matter branch from d02efdb to 56a5847 Compare February 14, 2026 22:40
@claude
Copy link

claude bot commented Feb 14, 2026

Code Review: Strongest-Path Loop Discovery Algorithm

Summary

This PR implements the "loops that matter" strongest-path discovery algorithm from Eberlein & Schoenberg (2020). The implementation is well-documented, thoroughly tested, and follows the codebase conventions. Overall, this is high-quality work with excellent test coverage and clear design choices.

Strengths

1. Excellent Documentation

  • 509-line design document (doc/ltm-finding.md) provides comprehensive context, algorithm description, and implementation plan
  • Clear assumptions and open questions documented upfront
  • Inline code comments explain non-obvious algorithmic choices (e.g., strict less-than pruning, best_score persistence)
  • Module-level documentation clearly states the purpose and relationship to existing LTM code

2. Strong Test Coverage

  • Unit tests cover core algorithm components (graph construction, trivial loops, Figure 7 example, best_score persistence, deduplication, edge cases)
  • Integration tests validate against multiple models (logistic growth, 3-party arms race, decoupled stocks)
  • Cross-validation with exhaustive enumeration ensures correctness on small models
  • Test models with expected behavior documented in test comments

3. Good Architecture

  • Clean separation between algorithm (ltm_finding.rs) and augmentation (ltm_augment.rs)
  • Reuses existing Loop type from ltm.rs for consistency
  • Clear public API: single discover_loops() function with well-defined inputs/outputs
  • Internal types (SearchGraph, ScoredEdge) properly encapsulated

4. Performance-Conscious Design

  • Edge sorting by score magnitude for efficient pruning
  • best_score persistence across stock iterations reduces redundant exploration
  • Early pruning with strict less-than comparison
  • NaN handling prevents propagation of invalid values

Issues and Concerns

1. CRITICAL: Thread Safety in Multi-Threaded Contexts

Location: src/simlin-engine/src/ltm_finding.rs:139-146

Issue: The algorithm relies on best_score state persisting across stock iterations, but uses standard HashMap. If this code is ever called from a multi-threaded context (e.g., parallel simulation), there could be data races.

Recommendation:

  • Document that discover_loops() is not thread-safe
  • Consider adding thread-safety comments or assertions
  • OR: Document that the engine is single-threaded by design (if true)

Severity: Medium (depends on future parallelization plans)

2. Code Quality: Redundant .abs() Call

Location: src/simlin-engine/src/ltm_finding.rs:215

Issue: edge.score is already the absolute value (per line 120). Calling .abs() again is redundant.

Recommendation: Remove the .abs() call: score * edge.score,

Severity: Low (correctness is fine, but code clarity suffers)

3. Potential Performance Issue: Quadratic String Allocations

Location: src/simlin-engine/src/ltm_finding.rs:323-325

Issue: This allocates strings for node sets twice: once at line 324 and again at line 237 within find_strongest_loops(). For models with many loops/timesteps, this creates unnecessary allocations.

Recommendation: Use a helper function to avoid duplication.

Severity: Low (likely negligible in practice, but cleaner code)

4. Test Brittleness: Hard-Coded Loop Count

Location: src/simlin-engine/tests/simulate_ltm.rs:388-392

Issue: The test expects exactly 7 loops with a detailed comment about deduplication. If the deduplication logic changes or the model is modified, this test will break without clear indication of whether the behavior is correct.

Recommendation: Add a comment documenting the expected loop structure OR make the test more flexible.

Severity: Low (test maintenance concern)

5. Missing Error Context

Location: src/simlin-engine/src/ltm_finding.rs:342-346

Issue: The error message doesn't indicate which project failed or provide debugging context.

Recommendation: Include project name or model count in error messages.

Severity: Low (better debugging experience)

6. Inconsistent NaN Handling Documentation

Issue: The design doc says "We treat NaN as 0 for discovery purposes and skip timestep 0 entirely," but the code skips step 0 AND treats NaN as 0. This creates potential confusion about whether NaN values can appear after step 0.

Recommendation: Clarify in comments whether NaN can appear beyond step 0 and why.

Severity: Low (documentation clarity)

Minor Issues

7. Code Style: Many Arguments

Location: src/simlin-engine/src/ltm_finding.rs:176

The check_outbound_uses function has 8 arguments. While the #[allow] is present, consider whether some arguments could be bundled into a context struct for better maintainability.

Severity: Very Low (code style preference)

8. Missing Const Documentation

Location: src/simlin-engine/src/ltm_finding.rs:32-36

Issue: Constants lack doc comments explaining their significance and tuning rationale.

Severity: Very Low (documentation completeness)

Questions

  1. Design Decision: Why skip timestep 0 entirely rather than just treating link scores as 0?

  2. Testing: The arms race test expects exactly 7 loops but the comment mentions 8. Is this deduplication behavior thoroughly tested elsewhere?

  3. Performance: Have you profiled this on the Urban Dynamics model (43M loops exhaustively)? What's the performance improvement?

  4. API Design: Should discover_loops() take an optional stride parameter for large models, as mentioned in the design doc?

Recommendations

High Priority

  1. Add thread-safety documentation or assertions
  2. Fix redundant .abs() call in line 215

Medium Priority

  1. Improve error messages with more context
  2. Add documentation for NaN handling edge cases
  3. Consider refactoring node_set creation to reduce duplication

Low Priority

  1. Add doc comments to constants
  2. Consider SearchContext struct for readability
  3. Make arms race test less brittle

Conclusion

This is excellent work. The implementation faithfully follows the paper's algorithm, the test coverage is comprehensive, and the code is well-documented. The issues raised are mostly minor code quality improvements rather than correctness concerns. The most significant item is clarifying thread-safety expectations.

Recommendation: Approve with minor changes requested.

Great job on this complex algorithmic work!

@claude
Copy link

claude bot commented Feb 14, 2026

Code Review: Strongest-Path Loop Discovery Algorithm

I've completed a thorough review of PR #249. This is an impressive implementation of the strongest-path loop discovery algorithm from Eberlein & Schoenberg (2020). The code quality is generally high, but I've identified several areas for improvement.

Summary

Strengths:

  • Excellent documentation in both the design doc and code comments
  • Comprehensive test coverage with well-chosen test cases
  • Clean separation of concerns and appropriate use of existing infrastructure
  • Careful handling of edge cases (NaN, zero scores, deduplication)

Areas for Improvement:

  • Opportunities for performance optimization
  • Minor API design improvements
  • Additional error handling in a few places

Critical Issues

1. Inefficient HashMap Lookup in Loop Score Computation

Location: src/simlin-engine/src/ltm_finding.rs:372-377

For each link in each loop at each timestep, the code performs a linear search through link_offsets. This is O(n) repeated O(loops × links × timesteps) times. For models with many links, this could be slow.

Recommendation: Pre-build a HashMap for O(1) lookups.


High-Priority Issues

2. best_score HashMap Pre-allocation

Location: src/simlin-engine/src/ltm_finding.rs:139-146

HashMap should be pre-allocated with capacity to avoid reallocations.

3. Missing Validation for Empty link_offsets

Location: src/simlin-engine/src/ltm_finding.rs:302-305

Silently returning empty results could hide configuration errors. Add debug warnings.

4. parse_link_offsets Error Handling

Location: src/simlin-engine/src/ltm_finding.rs:263

Silently ignores malformed variable names. Add debug assertions or warnings.


Overall Assessment

This is excellent work. The implementation is:

  • ✅ Correct (follows the paper accurately)
  • ✅ Well-tested (comprehensive coverage)
  • ✅ Well-documented (exceptional design doc)
  • ✅ Maintainable (clean separation of concerns)
  • ✅ Follows project conventions

The identified issues are mostly minor optimizations. The code is production-ready after addressing the high-priority HashMap lookup performance issue.

Code Quality Score: 9/10

@codecov
Copy link

codecov bot commented Feb 14, 2026

Codecov Report

❌ Patch coverage is 93.49593% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.80%. Comparing base (9596ca4) to head (f686008).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/simlin-engine/src/ltm_finding.rs 93.40% 12 Missing ⚠️
src/simlin-engine/tests/simulate_ltm.rs 91.53% 11 Missing ⚠️
src/simlin-engine/src/project.rs 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #249      +/-   ##
==========================================
+ Coverage   75.48%   75.80%   +0.31%     
==========================================
  Files         101      102       +1     
  Lines       22721    23066     +345     
==========================================
+ Hits        17151    17485     +334     
- Misses       5570     5581      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d02efdb393

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


for (var_name, &offset) in &results.offsets {
let name_str = var_name.as_str();
if let Some(suffix) = name_str.strip_prefix(LINK_SCORE_PREFIX) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Parse namespaced link-score variables before discovery

parse_link_offsets only accepts variable names that start exactly with $⁚ltm⁚link_score⁚, but compiled results namespace submodel variables as <module>.<name> (see calc_flattened_offsets), so submodel link-score variables like sector.$⁚ltm⁚link_score⁚... are silently skipped. In projects that use module instances, this causes discover_loops to ignore those links entirely and return incomplete loop analysis.

Useful? React with 👍 / 👎.

Comment on lines 338 to 341
let main_model = project
.models
.values()
.find(|m| !m.implicit)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Choose the analyzed model deterministically

The causal graph is built from project.models.values().find(|m| !m.implicit), but project.models is a HashMap, so iteration order is not stable and this can pick different non-implicit models across runs. In multi-model projects, loop conversion/polarity can be computed against the wrong model definition, yielding inconsistent or incorrect discovery output; this should use the explicitly simulated model (e.g. main) instead of first-match iteration.

Useful? React with 👍 / 👎.

Use HashMap for O(1) link offset lookup in the hot scoring loop
instead of linear scan per link per loop per timestep.

Return a proper error when a link score variable is missing instead
of silently treating it as NaN -- this indicates the project wasn't
properly augmented with with_ltm_all_links().

Fix get_stock_variables to only examine the first non-implicit model,
consistent with discover_loops() which builds the CausalGraph from
the same model.

Re-sort found loops by avg_abs_score descending after ID assignment
so callers receive results ranked by importance.

Tighten test assertions to exact counts and add subset validation
for the heuristic discovery tests. Remove debug print/eprintln
output from tests.
@claude
Copy link

claude bot commented Feb 14, 2026

Code Review: Strongest-Path Loop Discovery Algorithm

This PR implements a significant enhancement to the Loops That Matter (LTM) analysis, adding a heuristic-based loop discovery algorithm based on Eberlein & Schoenberg (2020). The implementation is thorough, well-tested, and follows good software engineering practices. Here's my detailed review:


Strengths

1. Excellent Documentation

  • The 509-line design document (doc/ltm-finding.md) is exceptional - it clearly explains the algorithm, assumptions, implementation approach, and testing strategy
  • Comprehensive inline comments in the code explaining the "why" behind implementation decisions
  • Clear module-level documentation explaining the algorithm's purpose and context

2. Strong Test Coverage

  • 14 unit tests in ltm_finding.rs covering edge cases (NaN handling, zero scores, empty graphs, deduplication, etc.)
  • Integration tests with cross-validation between exhaustive and discovery modes
  • Two new test models (arms race, decoupled stocks) specifically designed to validate the algorithm
  • Tests follow the TDD principle outlined in CLAUDE.md

3. Clean Architecture

  • Good separation of concerns: ltm_finding.rs is self-contained with a minimal public API
  • Reuses existing infrastructure (CausalGraph, Loop type) rather than duplicating
  • The SearchGraph type is appropriately private, exposing only discover_loops

4. Code Quality

  • Idiomatic Rust with proper use of Result/Option
  • No unwrap() or unwrap_or_default() in production code (uses proper error handling)
  • Clear variable naming and well-structured functions
  • Follows existing code style and conventions

Issues & Recommendations

Critical Issues

1. Potential Performance Issue: O(V²) per Timestep (ltm_finding.rs:330-343)

The current implementation runs the strongest-path search at every saved timestep (except t=0). For a model with 100 timesteps and 50 variables, this means 100 × O(V²) operations.

for step in 1..step_count {
    let graph = SearchGraph::from_results(results, step, &link_offsets, &stocks);
    let paths = graph.find_strongest_loops();
    // ...
}

Recommendation: Consider adding a stride parameter to sample fewer timesteps, especially for long simulations. The design doc mentions this as a future optimization (line 24), but given the factorial growth the algorithm is meant to address, this could become a bottleneck.

Suggested addition:

pub fn discover_loops_with_stride(
    results: &Results<f64>, 
    project: &Project,
    stride: usize  // Sample every Nth timestep
) -> Result<Vec<FoundLoop>>

2. NaN Handling Inconsistency (ltm_finding.rs:89-90 vs 386-387)

NaN is treated as 0 in SearchGraph::from_edges for the search itself, but loops with NaN scores get f64::NAN in the output scores. This is inconsistent - if a loop contains a link that's NaN at timestep t, should that loop even be discoverable at that timestep?

// In from_edges (line 90):
let score = if score.is_nan() { 0.0 } else { score };

// But in score computation (line 386):
if value.is_nan() {
    has_nan = true;
    break;
}

Recommendation: Document this behavior explicitly in the function documentation. If a link score is NaN (e.g., at t=0 or t=1 for flow-to-stock links), that loop probably shouldn't contribute to discovery at that timestep. Consider skipping timesteps where any link score is NaN during the search phase.

Major Issues

3. Missing Array Variable Check (ltm_finding.rs:307, ltm_augment.rs:87-101)

The existing LTM code has abort_if_arrayed() checks, but the new discovery mode doesn't explicitly check for arrays. While it inherits the restriction through CausalGraph::from_model, this should be documented.

Recommendation: Add explicit documentation in discover_loops stating:

/// # Limitations
/// - Array variables are not supported (the project must not contain arrayed variables)

4. Memory Usage with Large Models (ltm_finding.rs:325-343)

The algorithm accumulates all unique loop paths across all timesteps in memory (all_paths, seen_sets). For a model with thousands of timesteps, this could grow large.

Current:

let mut all_paths: Vec<Vec<Ident<Canonical>>> = Vec::new();
let mut seen_sets: HashSet<Vec<String>> = HashSet::new();

Recommendation: Consider using a bounded set with a priority queue to keep only the top N loops by score as they're discovered, rather than accumulating all unique loops first and then filtering.

Minor Issues

5. Hardcoded Constants (ltm_finding.rs:35-39)

MAX_LOOPS (200) and MIN_CONTRIBUTION (0.001) are hardcoded. While the design doc acknowledges these are from the paper, making them configurable would improve flexibility.

Recommendation: Add optional parameters with defaults:

pub struct DiscoveryConfig {
    pub max_loops: usize,
    pub min_contribution: f64,
    pub timestep_stride: usize,
}

impl Default for DiscoveryConfig {
    fn default() -> Self {
        Self {
            max_loops: 200,
            min_contribution: 0.001,
            timestep_stride: 1,
        }
    }
}

6. Code Duplication in Link Offset Lookups (ltm_finding.rs:384-401)

The loop score computation does a HashMap lookup for each link at each timestep. This is repeated work that could be optimized.

Current:

for step in 0..step_count {
    for link in &links {
        let link_key = (link.from.clone(), link.to.clone());
        if let Some(&offset) = link_offset_map.get(&link_key) {
            // ...
        }
    }
}

Recommendation: Pre-compute link offsets once per loop outside the timestep iteration.

7. Unclear Variable Naming (ltm_finding.rs:420-423)

runtime_scores vs polarity_structural - the distinction isn't immediately clear without context.

Recommendation: Add a comment:

// Determine runtime polarity from actual loop scores (may differ from structural analysis
// if links have unexpected signs during simulation)
let runtime_scores: Vec<f64> = scores.iter().map(|(_, s)| *s).collect();

8. Test Coverage Gap: Module Links

While ltm_augment.rs has tests for module link scores (lines 867-1233), ltm_finding.rs doesn't have specific tests for discovering loops that involve module variables.

Recommendation: Add an integration test in simulate_ltm.rs that validates discovery mode on a model with module connections.

9. Error Message Could Be More Helpful (ltm_finding.rs:395-401)

When a link score variable is missing, the error mentions with_ltm_all_links() but doesn't explain why it might be missing or what the user should do.

Recommendation:

details: Some(format\!(
    "Link score variable not found for {} -> {}.      This usually means the project was not augmented with with_ltm_all_links()      before simulation. Call project.with_ltm_all_links() before running the simulation,      then pass those results to discover_loops().",
    link.from.as_str(),
    link.to.as_str()
)),

10. Potential Precision Issue (ltm_finding.rs:376, 414-418)

Using f64 for timestep calculations with save_step could accumulate floating-point error over long simulations.

Current:

let time = results.specs.start + results.specs.save_step * (step as f64);

This is actually fine since it matches how Results stores time, but worth verifying against results.times[step] if that field exists.


Security & Safety

✅ No unsafe code
✅ No SQL injection, XSS, or similar vulnerabilities (simulation engine)
✅ Proper bounds checking (loop indices, array access)
✅ No unchecked user input (model files are parsed/validated before reaching this code)


Performance Considerations

Algorithmic Complexity

  • Per-timestep search: O(V² + E) where V = variables, E = edges
  • Total cost: O(T × (V² + E)) where T = saved timesteps
  • This is a significant improvement over exhaustive Johnson's algorithm O(V + E × C) where C = number of elementary circuits (can be factorial)

Potential Bottlenecks

  1. Many timesteps: A model with 1000 saved timesteps will run the search 1000 times
  2. String allocations: seen_sets: HashSet<Vec<String>> allocates for every unique loop
  3. Cloning Idents: link.from.clone(), link.to.clone() in the hot path

Recommendations:

  • Profile on a large model (e.g., Urban Dynamics with 43M loops mentioned in the design doc)
  • Consider interning variable names or using integer IDs for the search graph
  • Benchmark the stride parameter's impact

Testing Quality

Excellent Coverage

  • Unit tests are comprehensive (Figure 7 test traces the algorithm step-by-step!)
  • Integration tests validate against golden data
  • Cross-validation between exhaustive and discovery modes

Missing Test Cases

  1. Large model stress test: What happens with 1000+ variables?
  2. All-NaN timestep: What if an entire timestep has NaN scores?
  3. Module-inclusive loop discovery: Does it handle module black boxes correctly?

Design Decisions to Validate

1. Running at Every Saved Timestep (Design Doc Line 13)

The doc acknowledges this is a simplification from the paper ("every computational interval"). Have you considered:

  • Does this miss important loops that appear between saved timesteps?
  • For models with save_step > dt, could critical dynamics be hidden?

Suggestion: Document this tradeoff in the function documentation.

2. Absolute Values for Search (Design Doc Line 20)

The decision to use |link_score| during DFS but signed scores for output is correct per the paper, and well-implemented.

3. Deduplication by Node Set (ltm_finding.rs:244-246)

This matches the existing CausalGraph approach. Good consistency.


Code Style & Conventions

✅ Follows CLAUDE.md guidelines (commit message format, test-driven development)
✅ Proper Rust idioms (no unwrap abuse, good error handling)
✅ Clear variable names
✅ Appropriate use of #[cfg(test)] modules

Minor Style Issues

  • Line 179: #[allow(clippy::too_many_arguments)] is necessary here, but consider extracting search state into a struct if the function grows
  • Consistent naming: ltm_finding vs ltm_augment vs ltm - consider ltm::finding as a submodule for better namespace organization

Documentation Gaps

  1. Public API Documentation: discover_loops function documentation is good but could include an example
  2. Design Doc References: The code references "Appendix I" and "Eberlein & Schoenberg (2020)" but doesn't include a full citation
  3. Limitation Documentation: The "runs at saved timesteps" limitation should be in the function doc, not just the design doc

Summary

This is high-quality work that demonstrates:

  • ✅ Strong understanding of the algorithm
  • ✅ Rigorous testing approach
  • ✅ Clean, maintainable code
  • ✅ Excellent documentation

Recommended Actions Before Merge

Must Fix (Correctness):

  1. Clarify/document the NaN handling strategy (editor overlay #2)
  2. Add explicit array variable limitation documentation (server-side support #3)

Should Fix (Quality):

  1. Add stride parameter for large models (edit variable names #1)
  2. Improve error messages (drag selection and bulk move/edit #9)
  3. Add module loop discovery test (when moving objects w/ links, maintain relative link angle #8)

Nice to Have (Future):

  1. Make constants configurable (view + edit equation #5)
  2. Optimize memory usage (show simulation results #4)
  3. Profile on large models

Overall Assessment

Approve with minor changes recommended. The implementation is sound, well-tested, and follows best practices. The identified issues are primarily about performance optimization and documentation clarity, not correctness.


Great work on this implementation! The design document alone is a valuable artifact, and the code quality reflects careful attention to detail. 🎯

Use find_main_model() to look up "main" by canonical name instead
of relying on HashMap iteration order, which is non-deterministic.
Both get_stock_variables() and the CausalGraph construction now use
the same deterministic lookup.

Precompute results offsets per discovered loop path once, then use
the offset vector in the per-timestep scoring loop. This avoids
repeated HashMap lookups and Ident clones in the hot inner loop
(previously O(L * T) HashMap lookups per loop, now just O(L) once).

Rename test_self_loop_not_found to test_self_loop_found (the test
asserts self-loops ARE found). Add recursion depth documentation
on check_outbound_uses. Update design doc to say 7 unique loops
for the arms race model (the two three-way loops share the same
node set and are deduplicated).
@claude
Copy link

claude bot commented Feb 14, 2026

Code Review - PR #249: Strongest-Path Loop Discovery Algorithm

Executive Summary

This PR implements a heuristic "loops that matter" (LTM) discovery algorithm based on Eberlein & Schoenberg (2020). The implementation is high quality with excellent test coverage, clear documentation, and strong adherence to Rust best practices.

Recommendation: ✅ APPROVE with minor optional suggestions


Strengths

1. Excellent Code Quality

  • Proper error handling: Consistent use of Result<T> instead of panics. Only 2 .unwrap() calls found, both in test code.
  • Strong type system usage: Clear type aliases, appropriate use of conditional debug derives, strong typing with Ident<Canonical>.
  • Idiomatic Rust: Iterator methods, proper ownership and borrowing, no unsafe code.
  • No technical debt: No TODOs, FIXMEs, or placeholder comments.

2. Algorithm Correctness

  • Faithful to paper: Closely follows Appendix I of Eberlein & Schoenberg (2020).
  • Critical details handled correctly:
    • best_score persistence across stocks (ltm_finding.rs:141-149)
    • Strict less-than comparison for pruning (ltm_finding.rs:207)
    • Absolute values during DFS (ltm_finding.rs:123, 223)
    • Visiting set semantics for path tracking (ltm_finding.rs:197-203)
  • Edge cases covered: NaN handling, empty graphs, zero-score edges, self-loops.

3. Comprehensive Testing

  • 11 unit tests in ltm_finding.rs covering graph construction, edge cases, and paper Figure 7 validation.
  • 4 integration tests with real models (logistic growth, arms race, decoupled stocks).
  • Critical validation: Figure 7 test (ltm_finding.rs:614-650) includes step-by-step trace.

4. Outstanding Documentation

  • 510-line design document (ltm-finding.md) with clear assumptions, algorithm walkthrough, and test strategy.
  • Clear code comments explaining complex logic.
  • Well-documented types and functions.

5. Clean Integration

  • Single public function discover_loops() returns familiar FoundLoop type.
  • Reuses existing infrastructure (CausalGraph, LoopPolarity, etc.).
  • No breaking changes.

Suggested Improvements (Optional)

1. Redundant .abs() Call

Location: ltm_finding.rs:223

score * edge.score.abs()

Issue: Edge score is already stored as absolute value (line 123), so .abs() is redundant.
Suggestion: Remove .abs() or add comment explaining defensive programming.

2. Documentation Clarifications

  • ltm_finding.rs:100: Add comment explaining why unwrap_or(Equal) is safe for NaN handling in sorting.
  • ltm_finding.rs:285: Document when "main" model fallback is needed.
  • ltm_finding.rs:459, 479: Explain why double sorting is necessary (ID assignment vs. final ordering).

3. Test Assertions Could Be Clearer

Location: simulate_ltm.rs:381-386 (arms race) and simulate_ltm.rs:435-440 (decoupled stocks)

assert_eq!(found.len(), 3, "Discovery should find 3 loops...");

Issue: Comments mention 7 and 3 total loops respectively, but tests find 3 and 2.
Suggestion: Add comments explaining which loops are pruned and why (best_score dominance).

4. Minor DRY Improvement

Location: ltm_finding.rs:40 and ltm_augment.rs

const LTM_SEP: char = '⁚';

Observation: Unicode separator is duplicated.
Suggestion: Define in shared location or document why duplication is acceptable.

5. Performance Optimization Opportunity

Location: ltm_finding.rs:387-388

let offset = link_offset_map.get(&(link.from.clone(), link.to.clone()))

Observation: Clones identifiers for HashMap lookup in inner loop.
Suggestion: Consider using references in HashMap key or different key type to avoid clones.

6. Recursion Depth Documentation

Location: ltm_finding.rs:181-183
Observation: Comment notes recursion depth bounded by variable count. For 1000+ variable models, could approach stack limits.
Suggestion: Document maximum supported model size or add compile-time depth check.


Detailed Analysis

Algorithm Implementation

The implementation demonstrates sophisticated understanding of the paper's algorithm:

  • best_score persistence (lines 141-149): Correctly persists across stock iterations within a timestep—this is subtle but critical.
  • Strict pruning (line 207): Uses < not <=, matching paper specification.
  • NaN handling (lines 90, 123, 417-420, 424-426): Defensive and correct throughout.

Test Coverage

The test suite is exemplary:

  • Figure 7 validation proves algorithm correctness against published example.
  • Integration tests verify behavior on multi-loop models.
  • Edge case coverage includes empty graphs, NaN values, self-loops, zero-score edges.

Security & Safety

  • No unsafe code.
  • All input validation present.
  • No panics in production code.
  • No security concerns identified.

Adherence to CLAUDE.md Standards

Follows project standards:

  • TDD approach with comprehensive tests
  • Idiomatic Result/Option usage
  • No placeholders or technical debt
  • Clear documentation
  • Commit message follows style guide

Coverage target: CLAUDE.md specifies 95+% coverage. While I cannot verify without running coverage tools, the test suite appears comprehensive.


Overall Assessment

Rating: 9/10

This is exemplary Rust code. The implementation is correct, well-tested, thoroughly documented, and production-ready. The suggested improvements are minor and cosmetic—the PR can be merged as-is. The only reason it's not 10/10 is the few redundant operations and documentation gaps that could help future maintainers.

Excellent work! This is exactly the kind of careful, thorough implementation that sets a high bar for quality.

Add unit tests for rank_and_filter covering MAX_LOOPS truncation,
MIN_CONTRIBUTION filtering, and score-descending output ordering.
Extract rank_and_filter() from discover_loops() to make these code
paths directly testable.

Fix test_parse_link_offsets to use canonicalize() directly instead
of the canonicalize().to_source_repr() + from_unchecked() round-trip.
The round-trip happens to work for LTM variable names (no dots) but
is inconsistent with how the interpreter stores results keys.

Remove nondeterministic fallback from find_main_model() -- it now
only looks up "main" by canonical name, returning None if absent,
instead of falling back to arbitrary HashMap iteration.
@claude
Copy link

claude bot commented Feb 14, 2026

Code Review

I've reviewed PR #249 implementing the strongest-path loop discovery algorithm. This is an excellent, well-implemented feature with comprehensive tests and documentation. Here's my detailed feedback:

Strengths ✅

1. Exceptional Documentation

  • The doc/ltm-finding.md design document is outstanding: clear assumptions, algorithm explanation, implementation plan, and testing strategy
  • Code comments are thorough and explain the "why" behind non-obvious decisions
  • Commit messages follow the project style and provide excellent context

2. Comprehensive Test Coverage

  • Excellent unit test suite covering edge cases: empty graphs, NaN handling, zero scores, self-loops, deduplication, best_score persistence
  • Integration tests with multiple test models (logistic growth, arms race, decoupled stocks)
  • Cross-validation tests comparing exhaustive vs. heuristic approaches
  • The test for rank_and_filter directly validates filtering logic - great extraction for testability

3. Performance Optimizations

  • HashMap-based O(1) link offset lookups instead of linear scans (ltm_finding.rs:326-329)
  • Precomputed results offsets per loop path (ltm_finding.rs:382-397)
  • Edge lists sorted by score for efficient pruning
  • These optimizations show careful attention to performance in hot paths

4. Robust Error Handling

  • Proper error messages when link score variables are missing (ltm_finding.rs:386-395)
  • Indicates when with_ltm_all_links() wasn't called, helping users debug
  • NaN handling is explicit and well-documented (ltm_finding.rs:89-90, 413-423)

5. Code Quality

  • Clean separation of concerns: SearchGraph is internal, FoundLoop is the public API
  • Deterministic behavior: model lookup by canonical name, sorted stock lists, content-based loop IDs
  • Good use of idiomatic Rust: Result, Option, proper error types

Suggestions for Improvement 🔧

1. Recursion Depth Consideration (ltm_finding.rs:181-184)

The comment mentions that recursion depth is bounded by unique variables and that "very large models (1000+ variables) could in theory approach stack limits." Consider:

// Add a depth limit to prevent stack overflow in pathological cases
const MAX_RECURSION_DEPTH: usize = 500;

fn check_outbound_uses(
    &self,
    variable: &Ident<Canonical>,
    score: f64,
    target: &Ident<Canonical>,
    visiting: &mut HashSet<Ident<Canonical>>,
    stack: &mut Vec<Ident<Canonical>>,
    best_score: &mut HashMap<Ident<Canonical>, f64>,
    found_loops: &mut Vec<Vec<Ident<Canonical>>>,
    seen_sets: &mut HashSet<Vec<String>>,
    depth: usize,  // Add depth tracking
) {
    if depth > MAX_RECURSION_DEPTH {
        return; // Gracefully abort deep recursion
    }
    
    // ... existing checks ...
    
    // When recursing:
    self.check_outbound_uses(
        &edge.to,
        score * edge.score.abs(),
        target,
        visiting,
        stack,
        best_score,
        found_loops,
        seen_sets,
        depth + 1,  // Increment depth
    );
}

Alternatively, document why this is safe or add a debug assertion to catch unexpected cases during development.

2. Minor: Clone Efficiency (ltm_finding.rs:384-385)

While precomputing offsets is great, you still clone Idents for the HashMap lookup:

let offset = link_offset_map
    .get(&(link.from.clone(), link.to.clone()))  // Clones for lookup

Since you're iterating over links from the CausalGraph, consider whether you can borrow here instead. This is minor since Ident<Canonical> is likely small, but worth checking if profiling shows this as hot.

3. Test Model Documentation

The test models (test/arms_race_3party/arms_race.stmx, test/decoupled_stocks/decoupled.stmx) are referenced in tests but could benefit from README files explaining:

  • What each model represents
  • Expected loop count and characteristics
  • Why each model is useful for testing this algorithm

This helps future contributors understand the test suite.

4. Edge Case: Empty Results

The code handles link_offsets.is_empty() and stocks.is_empty() (ltm_finding.rs:321-334), but what happens if simulation produces zero timesteps? Consider adding a guard or test for step_count == 0 or step_count == 1 (since you skip step 0).

5. Documentation: Algorithm Limitations

The design doc mentions this is a heuristic and "NOT guaranteed to find the absolute strongest loop" (ltm-finding.md:100). Consider adding this caveat to the rustdoc for discover_loops() so API users understand the trade-offs:

/// Run the strongest-path loop discovery on simulation results.
///
/// This is a heuristic algorithm that finds important loops without exhaustive
/// enumeration. It is NOT guaranteed to find all loops or the absolute strongest
/// loops, but scales to large models where exhaustive enumeration is infeasible.
/// ...

Minor Style Notes 📝

1. Consistent Error Messages

Most error messages are excellent, but consider standardizing capitalization:

  • ltm_finding.rs:367: "No non-implicit model found..." (lowercase after code)
  • Could be: "no non-implicit model found..." to match Rust conventions

2. Test Assertions

Some test assertions include the actual value in the message (good!), others don't:

  • simulate_ltm.rs:269-273: Includes found count ✅
  • simulate_ltm.rs:647-650: Includes found count ✅

This is already consistent in most places - excellent work!

Security & Correctness ✅

  • No security concerns identified
  • Thread safety: Not applicable (no shared mutable state, all operations are pure or on owned data)
  • Integer overflow: Link score multiplication could theoretically overflow with extreme values, but NaN handling catches this gracefully
  • Memory safety: All uses of slicing and indexing are bounds-checked or use safe iterators

Performance Considerations ⚡

The optimizations in commits 3-5 are excellent:

  1. HashMap for O(1) lookups (commit 3)
  2. Precomputed result offsets (commit 4)
  3. Deterministic model selection avoiding nondeterministic iteration (commit 4)

One additional consideration: For very large simulations (many timesteps), you're building a SearchGraph per timestep (ltm_finding.rs:344). If memory becomes an issue, you could consider:

  • Processing timesteps in batches
  • Streaming results rather than storing all discovered loops in memory

But this is likely premature optimization - the current approach is clear and correct.

Conclusion 🎉

This is production-ready code. The implementation is clean, well-tested, thoroughly documented, and follows all project conventions. The suggestions above are minor enhancements, not blockers.

Recommendation: Approve and merge after considering whether to add:

  1. Recursion depth limit or documentation justifying safety
  2. Brief test model README files

Great work on a complex feature! The attention to detail in testing (especially the rank_and_filter extraction and cross-validation tests) and the thorough design document make this easy to review and maintain.

@bpowers
Copy link
Owner Author

bpowers commented Feb 14, 2026

Addressed review feedback across 3 iterations:

The main improvements are around correctness, performance, and test quality. The O(n) linear scan for link offset lookups in the hot scoring loop was replaced with a HashMap for O(1) access, and link offsets are now precomputed per discovered loop path to avoid repeated HashMap lookups and Ident clones in the per-timestep inner loop. Model selection was made deterministic by looking up "main" by canonical name instead of relying on HashMap iteration order. Missing link score variables now return a proper error instead of silently degrading to NaN.

Test coverage was significantly tightened: integration test assertions now use exact counts with explanatory comments about heuristic behavior and validate that discovered loops are subsets of exhaustive results. New unit tests cover MAX_LOOPS truncation, MIN_CONTRIBUTION filtering, and score-based output ordering. Debug println/eprintln output was removed from tests, a misnamed test was corrected, and the design doc was updated to reflect 7 (not 8) unique loops in the arms race model after deduplication.

@bpowers bpowers merged commit fc0765d into main Feb 15, 2026
10 checks passed
@bpowers bpowers deleted the bobby/finding-loops-that-matter branch February 15, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant