Skip to content

[WIP, POC] Unified threading model in MIP solver#820

Draft
nguidotti wants to merge 141 commits intoNVIDIA:mainfrom
nguidotti:unified-parallel-model
Draft

[WIP, POC] Unified threading model in MIP solver#820
nguidotti wants to merge 141 commits intoNVIDIA:mainfrom
nguidotti:unified-parallel-model

Conversation

@nguidotti
Copy link
Contributor

@nguidotti nguidotti commented Feb 2, 2026

This PR unifies the threading model across the MIP solver, such that the code now uses the tasking model from OpenMP. The only exception is the Papilo presolver that uses Intel TBB.

More specifically, this PR

  • Solves the CPU oversubscription problem. The solver now respect the number of threads set to the user, with the exception of Papilo or threads created by the CUDA runtime.
  • Removes overheads from creating and destroying std::thread. This also
  • Migrates RINS from std::thread to omp task. Similar to previous logic, one instance of RINS can run at a time.
  • Migrates CPU FJ from std::thread to omp task. There are a few limitations
    • scratch_cpu_fj_on_lp_opt and scratch_cpu_fj are running for the entire program. This essentially allocate two dedicated threads to these functions, while other routines needs to share the remaining CPU resources. This may hurt the performance for low core count CPUs.
    • Since there is a small delay between the task creation and its start (since the threads may be busy), the GPU FJ may finish before the CPU FJ even start when racing.
  • Replaces locks in the pseudocost with atomics to reduce contention.

This is a POC, hence it requires some polish before it can be merged.

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

Summary by CodeRabbit

  • New Features

    • Added --reliability-branching CLI option to control advanced branching strategy in MIP solver (enabled by default)
    • Introduced parallel and single-threaded solve modes for branch-and-bound algorithm
  • Improvements

    • Enhanced parallel branch-and-bound performance through improved work distribution and task scheduling
    • Improved solver synchronization and multi-threaded execution efficiency
    • Added reliability-based variable selection strategy for optimal branching decisions
  • Refactoring

    • Internal solver architecture updated for better scalability with worker-pool model

nguidotti and others added 30 commits December 12, 2025 10:22
…iteration and node limit to the diving threads.
@nguidotti nguidotti self-assigned this Feb 2, 2026
@nguidotti nguidotti added non-breaking Introduces a non-breaking change do not merge Do not merge if this flag is set improvement Improves an existing functionality mip labels Feb 2, 2026
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 2, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@nguidotti nguidotti changed the title [WIP] Unified threading model in MIP solver [WIP, POC] Unified threading model in MIP solver Feb 2, 2026
@nguidotti
Copy link
Contributor Author

/ok to test 9731c76

@nguidotti
Copy link
Contributor Author

@CodeRabbit review

@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

This pull request introduces reliability branching for MIP solving and refactors the branch-and-bound solver from a direct threading model to an OpenMP-based worker pool architecture. Key changes include a new branch-and-bound worker framework, modified solver settings, updated heuristics integration (RINS, Local Search, Feasibility Jump), and removal of legacy CPU worker thread infrastructure.

Changes

Cohort / File(s) Summary
CLI and Configuration
benchmarks/linear_programming/cuopt/run_mip.cpp, cpp/include/cuopt/linear_programming/mip/solver_settings.hpp
Added reliability_branching boolean flag with CLI option --reliability-branching, propagated through run paths, and integrated into solver settings with default value true.
Branch-and-Bound Worker Framework
cpp/src/dual_simplex/bnb_worker.hpp
New header introducing worker pool infrastructure: bnb_worker_type_t enum (BEST_FIRST, diving strategies), bnb_stats_t for statistics, bnb_worker_data_t for per-worker state, bnb_worker_pool_t for lifecycle management, and utility templates for worker allocation.
Branch-and-Bound Core Refactoring
cpp/src/dual_simplex/branch_and_bound.cpp, cpp/src/dual_simplex/branch_and_bound.hpp
Major architectural refactor replacing thread-type dispatch with worker-data-driven approach; replaced per-thread local structures with centralized worker pool; introduced mip_solve_mode_t (BNB_PARALLEL, BNB_SINGLE_THREADED); updated solve() signature to accept solve mode; new methods run_scheduler(), single_threaded_solve(), plunge_with(), dive_with(); simplified method signatures to use worker_data.
Solver Infrastructure Updates
cpp/src/dual_simplex/bounds_strengthening.hpp, cpp/src/dual_simplex/bounds_strengthening.cpp, cpp/src/dual_simplex/presolve.cpp, cpp/src/dual_simplex/pseudo_costs.hpp, cpp/src/dual_simplex/pseudo_costs.cpp, cpp/src/dual_simplex/simplex_solver_settings.hpp
Modified bounds strengthening API to accept bounds_changed parameter externally; introduced reliability_branching_settings_t struct; added per-variable atomic mutexes in pseudo costs; new trial_branching() function and reliable_variable_selection() method; updated diving settings with min/max depth and backtrack limits.
Solver Components and Build
cpp/src/dual_simplex/basis_updates.hpp, cpp/src/dual_simplex/CMakeLists.txt, cpp/src/dual_simplex/mip_node.hpp, cpp/src/dual_simplex/mip_node.cpp, cpp/src/dual_simplex/node_queue.hpp, cpp/src/dual_simplex/solution.hpp, cpp/src/dual_simplex/diving_heuristics.cpp
Added default copy constructor/assignment to basis_update_mpf_t; removed mip_node.cpp from build; inlined inactive_status() in header; internalized thread-safety in node_queue_t (removed public lock/unlock); changed node counter types to int64_t in mip_solution_t; removed mutex usage in diving heuristics.
MIP Heuristics Refactoring
cpp/src/mip/diversity/lns/rins.cu, cpp/src/mip/diversity/lns/rins.cuh, cpp/src/mip/diversity/recombiners/sub_mip.cuh, cpp/src/mip/feasibility_jump/feasibility_jump.cuh, cpp/src/mip/feasibility_jump/fj_cpu.cu, cpp/src/mip/feasibility_jump/fj_cpu.cuh, cpp/src/mip/local_search/local_search.cu, cpp/src/mip/local_search/local_search.cuh
Replaced thread-based execution model with OpenMP task-based approach; RINS now accepts LP solution as parameter via run_rins(std::vector<f_t>); removed cpu_fj_thread_t lifecycle methods; changed cpu_solve() to void; refactored Local Search to use dynamic vectors of climber pointers; updated SubMIP to use single-threaded B&B mode.
MIP Solver Integration
cpp/src/dual_simplex/solve.cpp, cpp/src/mip/solver.cu
Updated branch_and_bound.solve() calls to pass mip_solve_mode_t::BNB_PARALLEL; refactored main solver to use OpenMP parallel structure with task-based B&B and primal heuristics execution; moved to single sol object pattern; removed explicit stop_rins calls; added reliability_branching linkage.
Utility Infrastructure
cpp/src/mip/utilities/cpu_worker_thread.cuh, cpp/src/utilities/omp_helpers.hpp, cpp/src/utilities/pcg.hpp
Removed entire cpu_worker_thread_base_t CRTP base class (threading primitives); made omp_atomic_t conversion operator const-correct; introduced new CPU-side PCG random number generator with seed management, skip-ahead, and distribution utilities.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.63% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title '[WIP, POC] Unified threading model in MIP solver' accurately describes the main change: a refactoring to unify the threading model across the MIP solver to use OpenMP tasking instead of std::thread, while maintaining system thread-count respect.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
cpp/src/dual_simplex/solution.hpp (1)

54-60: ⚠️ Potential issue | 🔴 Critical

Uninitialized members: nodes_explored and simplex_iterations.

These fields are not initialized in the constructor's initializer list. Reading them before assignment is undefined behavior. Other members (objective, lower_bound, has_incumbent) are properly initialized, but these two are missing.

As per coding guidelines: "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving."

🐛 Proposed fix to initialize the counters
   mip_solution_t(i_t n)
     : x(n),
       objective(std::numeric_limits<f_t>::quiet_NaN()),
       lower_bound(-inf),
-      has_incumbent(false)
+      has_incumbent(false),
+      nodes_explored(0),
+      simplex_iterations(0)
   {
   }

Also applies to: 75-76

cpp/src/mip/feasibility_jump/fj_cpu.cu (1)

1041-1046: ⚠️ Potential issue | 🟡 Minor

Potential division by zero if loop exits before any iteration.

If preemption_flag is set before the first iteration completes, fj_cpu.iterations remains 0, causing division by zero at line 1043.

Proposed fix
   auto loop_end = std::chrono::high_resolution_clock::now();
   double total_time =
     std::chrono::duration_cast<std::chrono::duration<double>>(loop_end - loop_start).count();
-  double avg_time_per_iter = total_time / fj_cpu.iterations;
+  double avg_time_per_iter = fj_cpu.iterations > 0 ? total_time / fj_cpu.iterations : 0.0;
   CUOPT_LOG_TRACE("%sCPUFJ Average time per iteration: %.8fms\n",
                   fj_cpu.log_prefix.c_str(),
                   avg_time_per_iter * 1000.0);
cpp/src/mip/local_search/local_search.cu (1)

50-51: ⚠️ Potential issue | 🟠 Major

Thread-unsafe static globals pose data race risk.

local_search_best_obj and pop_ptr are static file-scope variables accessed and modified by multiple OpenMP tasks without synchronization. This can lead to data races when multiple local search instances run concurrently or when tasks from different threads access these variables simultaneously.

Consider converting these to member variables of local_search_t or using thread-local storage if the intent is per-thread state.

🤖 Fix all issues with AI agents
In `@cpp/src/dual_simplex/bnb_worker.hpp`:
- Around line 182-215: The split API creates a TOCTOU race: callers could call
get_idle_worker() then another thread could pop that worker before
pop_idle_worker() is called—use the atomic operation get_and_pop_idle_worker()
instead; remove or make get_idle_worker() and pop_idle_worker() private (or
delete them) so external code cannot call the unsafe pair, and update any
callers to use get_and_pop_idle_worker() which already locks mutex_ and updates
idle_workers_ and num_idle_workers_ while returning workers_[idx].get().

In `@cpp/src/dual_simplex/pseudo_costs.cpp`:
- Line 417: The computation of bnb_lp_iter_per_node divides bnb_total_lp_iter by
bnb_stats.nodes_explored which can be zero; update the calculation used to
produce bnb_lp_iter_per_node (and any dependent uses) to guard against
division-by-zero by checking bnb_stats.nodes_explored first and using a safe
fallback (e.g., 0 or treat denominator as 1) — replace the direct division of
bnb_total_lp_iter / bnb_stats.nodes_explored with a conditional/ternary that
yields a defined value when bnb_stats.nodes_explored == 0 so functions/variables
like bnb_lp_iter_per_node are never computed with a zero divisor.
- Around line 387-568: In reliable_variable_selection, the trial-branch updates
divide by change_in_x (for down/up) which can be near zero and yield huge/NaN
pseudo-costs; before updating pseudo_cost_sum_down/up and incrementing
pseudo_cost_num_down/up, guard the division by checking fabs(change_in_x)
against a small tolerance (use existing integer_tol if available or a local eps
like 1e-8): if below tol, skip the update (or clamp denominator to tol) and do
not increment the count; apply the same guard for both the down-branch block
(where change_in_x = solution[j] - floor(solution[j])) and the up-branch block
(ceil(solution[j]) - solution[j]) to ensure stable pseudo-cost_sum_* and
pseudo_cost_num_* updates.

In `@cpp/src/mip/local_search/local_search.cu`:
- Around line 139-140: The code dereferences scratch_cpu_fj_on_lp_opt without
checking for null in the stop_cpufj_scratch_threads path; add a null check (or
guard) before using scratch_cpu_fj_on_lp_opt in stop_cpufj_scratch_threads so
you only set halted when scratch_cpu_fj_on_lp_opt is non-null, or ensure
start_cpufj_lptopt_scratch_threads fully initializes scratch_cpu_fj_on_lp_opt
before stop can run; reference the scratch_cpu_fj_on_lp_opt variable and the
stop_cpufj_scratch_threads / start_cpufj_lptopt_scratch_threads functions when
applying the guard.

In `@cpp/src/mip/solver.cu`:
- Line 150: The variable bb_status (type dual_simplex::mip_status_t) must be
initialized at its declaration to avoid undefined behavior if the OpenMP task
that writes it never runs (e.g., heuristics_only path); change its declaration
to value-initialize (for example: dual_simplex::mip_status_t bb_status = {} or
an explicit safe enum like dual_simplex::mip_status_t::UNKNOWN) and keep the
task assignment as-is so reads after the parallel region see a defined value.
🧹 Nitpick comments (5)
cpp/src/dual_simplex/bnb_worker.hpp (1)

269-280: Integer division for worker distribution may produce uneven allocation.

Lines 276-277 use floating-point division then truncate to i_t:

i_t start = (double)k * diving_workers / m;
i_t end   = (double)(k + 1) * diving_workers / m;

This is a valid distribution approach, but the cast from double to i_t could truncate unexpectedly for large values. Consider using integer arithmetic for deterministic rounding:

i_t start = (k * diving_workers) / m;
i_t end   = ((k + 1) * diving_workers) / m;

This is a minor concern since worker counts are typically small.

cpp/include/cuopt/linear_programming/mip/solver_settings.hpp (1)

86-88: New solver setting added correctly.

The reliability_branching flag is added with a sensible default (true). This allows users to disable the feature when needed (e.g., for debugging or specific problem types).

Consider adding a brief doc comment describing what reliability branching does and when users might want to disable it:

/// Enable reliability branching for variable selection in B&B (default: true)
bool reliability_branching = true;
cpp/src/mip/local_search/local_search.cu (1)

127-130: Consider replacing cudaDeviceSynchronize with stream sync.

cudaDeviceSynchronize() blocks all GPU work across all streams. Since the intent appears to be ensuring the solution_lp copy is complete before spawning the CPU task, a stream-specific sync would be less blocking:

context.problem_ptr->handle_ptr->sync_stream();

This aligns with coding guidelines to minimize unnecessary host-device synchronization.

cpp/src/mip/solver.cu (1)

157-162: Thread allocation may leave B&B under-resourced on low-core systems.

With num_threads - 3 for B&B, systems with 4 or fewer cores will have B&B limited to 1 thread. This is noted in the PR description but worth documenting in code comments for future maintainers.

Also applies to: 176-176

cpp/src/dual_simplex/branch_and_bound.cpp (1)

1145-1208: Single-threaded solve for SubMIP/RINS.

The single_threaded_solve method mirrors the scheduler loop structure but uses a single worker without task spawning. This avoids OpenMP overhead for small sub-problems.

Dead code on Line 1163: launched_any_task is declared but never used in this method. Consider removing it.

🧹 Remove unused variable
  while (solver_status_ == mip_status_t::UNSET && abs_gap > settings_.absolute_mip_gap_tol &&
         rel_gap > settings_.relative_mip_gap_tol && node_queue_.best_first_queue_size() > 0) {
-   bool launched_any_task = false;
    lower_bound            = get_lower_bound();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge Do not merge if this flag is set improvement Improves an existing functionality mip non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants