[WIP, POC] Unified threading model in MIP solver by nguidotti · Pull Request #820 · NVIDIA/cuopt

nguidotti · 2026-02-02T15:50:40Z

This PR unifies the threading model across the MIP solver, such that the code now uses the tasking model from OpenMP. The only exception is the Papilo presolver that uses Intel TBB.

More specifically, this PR

Solves the CPU oversubscription problem. The solver now respect the number of threads set to the user, with the exception of Papilo or threads created by the CUDA runtime.
Removes overheads from creating and destroying std::thread. This also
Migrates RINS from std::thread to omp task. Similar to previous logic, one instance of RINS can run at a time.
Migrates CPU FJ from std::thread to omp task. There are a few limitations
- scratch_cpu_fj_on_lp_opt and scratch_cpu_fj are running for the entire program. This essentially allocate two dedicated threads to these functions, while other routines needs to share the remaining CPU resources. This may hurt the performance for low core count CPUs.
- Since there is a small delay between the task creation and its start (since the threads may be busy), the GPU FJ may finish before the CPU FJ even start when racing.
Replaces locks in the pseudocost with atomics to reduce contention.

This is a POC, hence it requires some polish before it can be merged.

Checklist

I am familiar with the Contributing Guidelines.
Testing
- New or existing tests cover these changes
- Added tests
- Created an issue to follow-up
- NA
Documentation
- The documentation is up to date with these changes
- Added new documentation
- NA

Summary by CodeRabbit

New Features
- Added --reliability-branching CLI option to control advanced branching strategy in MIP solver (enabled by default)
- Introduced parallel and single-threaded solve modes for branch-and-bound algorithm
Improvements
- Enhanced parallel branch-and-bound performance through improved work distribution and task scheduling
- Improved solver synchronization and multi-threaded execution efficiency
- Added reliability-based variable selection strategy for optimal branching decisions
Refactoring
- Internal solver architecture updated for better scalability with worker-pool model

… on the objective pseudcost estimate.

…eady explored.

…iteration and node limit to the diving threads.

…diving

copy-pr-bot · 2026-02-02T15:50:46Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…imeout

nguidotti · 2026-02-03T10:34:03Z

/ok to test 9731c76

nguidotti · 2026-02-03T10:34:36Z

@CodeRabbit review

coderabbitai · 2026-02-03T10:34:43Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-02-03T10:44:42Z

📝 Walkthrough

Walkthrough

This pull request introduces reliability branching for MIP solving and refactors the branch-and-bound solver from a direct threading model to an OpenMP-based worker pool architecture. Key changes include a new branch-and-bound worker framework, modified solver settings, updated heuristics integration (RINS, Local Search, Feasibility Jump), and removal of legacy CPU worker thread infrastructure.

Changes

Cohort / File(s)	Summary
CLI and Configuration `benchmarks/linear_programming/cuopt/run_mip.cpp`, `cpp/include/cuopt/linear_programming/mip/solver_settings.hpp`	Added `reliability_branching` boolean flag with CLI option `--reliability-branching`, propagated through run paths, and integrated into solver settings with default value true.
Branch-and-Bound Worker Framework `cpp/src/dual_simplex/bnb_worker.hpp`	New header introducing worker pool infrastructure: `bnb_worker_type_t` enum (BEST_FIRST, diving strategies), `bnb_stats_t` for statistics, `bnb_worker_data_t` for per-worker state, `bnb_worker_pool_t` for lifecycle management, and utility templates for worker allocation.
Branch-and-Bound Core Refactoring `cpp/src/dual_simplex/branch_and_bound.cpp`, `cpp/src/dual_simplex/branch_and_bound.hpp`	Major architectural refactor replacing thread-type dispatch with worker-data-driven approach; replaced per-thread local structures with centralized worker pool; introduced `mip_solve_mode_t` (BNB_PARALLEL, BNB_SINGLE_THREADED); updated `solve()` signature to accept solve mode; new methods `run_scheduler()`, `single_threaded_solve()`, `plunge_with()`, `dive_with()`; simplified method signatures to use worker_data.
Solver Infrastructure Updates `cpp/src/dual_simplex/bounds_strengthening.hpp`, `cpp/src/dual_simplex/bounds_strengthening.cpp`, `cpp/src/dual_simplex/presolve.cpp`, `cpp/src/dual_simplex/pseudo_costs.hpp`, `cpp/src/dual_simplex/pseudo_costs.cpp`, `cpp/src/dual_simplex/simplex_solver_settings.hpp`	Modified bounds strengthening API to accept `bounds_changed` parameter externally; introduced `reliability_branching_settings_t` struct; added per-variable atomic mutexes in pseudo costs; new `trial_branching()` function and `reliable_variable_selection()` method; updated diving settings with min/max depth and backtrack limits.
Solver Components and Build `cpp/src/dual_simplex/basis_updates.hpp`, `cpp/src/dual_simplex/CMakeLists.txt`, `cpp/src/dual_simplex/mip_node.hpp`, `cpp/src/dual_simplex/mip_node.cpp`, `cpp/src/dual_simplex/node_queue.hpp`, `cpp/src/dual_simplex/solution.hpp`, `cpp/src/dual_simplex/diving_heuristics.cpp`	Added default copy constructor/assignment to `basis_update_mpf_t`; removed `mip_node.cpp` from build; inlined `inactive_status()` in header; internalized thread-safety in `node_queue_t` (removed public lock/unlock); changed node counter types to int64_t in `mip_solution_t`; removed mutex usage in diving heuristics.
MIP Heuristics Refactoring `cpp/src/mip/diversity/lns/rins.cu`, `cpp/src/mip/diversity/lns/rins.cuh`, `cpp/src/mip/diversity/recombiners/sub_mip.cuh`, `cpp/src/mip/feasibility_jump/feasibility_jump.cuh`, `cpp/src/mip/feasibility_jump/fj_cpu.cu`, `cpp/src/mip/feasibility_jump/fj_cpu.cuh`, `cpp/src/mip/local_search/local_search.cu`, `cpp/src/mip/local_search/local_search.cuh`	Replaced thread-based execution model with OpenMP task-based approach; RINS now accepts LP solution as parameter via `run_rins(std::vector<f_t>)`; removed `cpu_fj_thread_t` lifecycle methods; changed `cpu_solve()` to void; refactored Local Search to use dynamic vectors of climber pointers; updated SubMIP to use single-threaded B&B mode.
MIP Solver Integration `cpp/src/dual_simplex/solve.cpp`, `cpp/src/mip/solver.cu`	Updated `branch_and_bound.solve()` calls to pass `mip_solve_mode_t::BNB_PARALLEL`; refactored main solver to use OpenMP parallel structure with task-based B&B and primal heuristics execution; moved to single sol object pattern; removed explicit stop_rins calls; added reliability_branching linkage.
Utility Infrastructure `cpp/src/mip/utilities/cpu_worker_thread.cuh`, `cpp/src/utilities/omp_helpers.hpp`, `cpp/src/utilities/pcg.hpp`	Removed entire `cpu_worker_thread_base_t` CRTP base class (threading primitives); made `omp_atomic_t` conversion operator const-correct; introduced new CPU-side PCG random number generator with seed management, skip-ahead, and distribution utilities.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.63% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title '[WIP, POC] Unified threading model in MIP solver' accurately describes the main change: a refactoring to unify the threading model across the MIP solver to use OpenMP tasking instead of std::thread, while maintaining system thread-count respect.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

cpp/src/dual_simplex/solution.hpp (1)
54-60: ⚠️ Potential issue | 🔴 Critical

Uninitialized members: nodes_explored and simplex_iterations.

These fields are not initialized in the constructor's initializer list. Reading them before assignment is undefined behavior. Other members (objective, lower_bound, has_incumbent) are properly initialized, but these two are missing.

As per coding guidelines: "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving."
🐛 Proposed fix to initialize the counters
   mip_solution_t(i_t n)
     : x(n),
       objective(std::numeric_limits<f_t>::quiet_NaN()),
       lower_bound(-inf),
-      has_incumbent(false)
+      has_incumbent(false),
+      nodes_explored(0),
+      simplex_iterations(0)
   {
   }
Also applies to: 75-76
cpp/src/mip/feasibility_jump/fj_cpu.cu (1)
1041-1046: ⚠️ Potential issue | 🟡 Minor

Potential division by zero if loop exits before any iteration.

If preemption_flag is set before the first iteration completes, fj_cpu.iterations remains 0, causing division by zero at line 1043.
Proposed fix
   auto loop_end = std::chrono::high_resolution_clock::now();
   double total_time =
     std::chrono::duration_cast<std::chrono::duration<double>>(loop_end - loop_start).count();
-  double avg_time_per_iter = total_time / fj_cpu.iterations;
+  double avg_time_per_iter = fj_cpu.iterations > 0 ? total_time / fj_cpu.iterations : 0.0;
   CUOPT_LOG_TRACE("%sCPUFJ Average time per iteration: %.8fms\n",
                   fj_cpu.log_prefix.c_str(),
                   avg_time_per_iter * 1000.0);
cpp/src/mip/local_search/local_search.cu (1)

50-51: ⚠️ Potential issue | 🟠 Major

Thread-unsafe static globals pose data race risk.

local_search_best_obj and pop_ptr are static file-scope variables accessed and modified by multiple OpenMP tasks without synchronization. This can lead to data races when multiple local search instances run concurrently or when tasks from different threads access these variables simultaneously.

Consider converting these to member variables of local_search_t or using thread-local storage if the intent is per-thread state.

🤖 Fix all issues with AI agents

In `@cpp/src/dual_simplex/bnb_worker.hpp`:
- Around line 182-215: The split API creates a TOCTOU race: callers could call
get_idle_worker() then another thread could pop that worker before
pop_idle_worker() is called—use the atomic operation get_and_pop_idle_worker()
instead; remove or make get_idle_worker() and pop_idle_worker() private (or
delete them) so external code cannot call the unsafe pair, and update any
callers to use get_and_pop_idle_worker() which already locks mutex_ and updates
idle_workers_ and num_idle_workers_ while returning workers_[idx].get().

In `@cpp/src/dual_simplex/pseudo_costs.cpp`:
- Line 417: The computation of bnb_lp_iter_per_node divides bnb_total_lp_iter by
bnb_stats.nodes_explored which can be zero; update the calculation used to
produce bnb_lp_iter_per_node (and any dependent uses) to guard against
division-by-zero by checking bnb_stats.nodes_explored first and using a safe
fallback (e.g., 0 or treat denominator as 1) — replace the direct division of
bnb_total_lp_iter / bnb_stats.nodes_explored with a conditional/ternary that
yields a defined value when bnb_stats.nodes_explored == 0 so functions/variables
like bnb_lp_iter_per_node are never computed with a zero divisor.
- Around line 387-568: In reliable_variable_selection, the trial-branch updates
divide by change_in_x (for down/up) which can be near zero and yield huge/NaN
pseudo-costs; before updating pseudo_cost_sum_down/up and incrementing
pseudo_cost_num_down/up, guard the division by checking fabs(change_in_x)
against a small tolerance (use existing integer_tol if available or a local eps
like 1e-8): if below tol, skip the update (or clamp denominator to tol) and do
not increment the count; apply the same guard for both the down-branch block
(where change_in_x = solution[j] - floor(solution[j])) and the up-branch block
(ceil(solution[j]) - solution[j]) to ensure stable pseudo-cost_sum_* and
pseudo_cost_num_* updates.

In `@cpp/src/mip/local_search/local_search.cu`:
- Around line 139-140: The code dereferences scratch_cpu_fj_on_lp_opt without
checking for null in the stop_cpufj_scratch_threads path; add a null check (or
guard) before using scratch_cpu_fj_on_lp_opt in stop_cpufj_scratch_threads so
you only set halted when scratch_cpu_fj_on_lp_opt is non-null, or ensure
start_cpufj_lptopt_scratch_threads fully initializes scratch_cpu_fj_on_lp_opt
before stop can run; reference the scratch_cpu_fj_on_lp_opt variable and the
stop_cpufj_scratch_threads / start_cpufj_lptopt_scratch_threads functions when
applying the guard.

In `@cpp/src/mip/solver.cu`:
- Line 150: The variable bb_status (type dual_simplex::mip_status_t) must be
initialized at its declaration to avoid undefined behavior if the OpenMP task
that writes it never runs (e.g., heuristics_only path); change its declaration
to value-initialize (for example: dual_simplex::mip_status_t bb_status = {} or
an explicit safe enum like dual_simplex::mip_status_t::UNKNOWN) and keep the
task assignment as-is so reads after the parallel region see a defined value.

🧹 Nitpick comments (5)

cpp/src/dual_simplex/bnb_worker.hpp (1)
269-280: Integer division for worker distribution may produce uneven allocation.

Lines 276-277 use floating-point division then truncate to i_t:
i_t start = (double)k * diving_workers / m;
i_t end   = (double)(k + 1) * diving_workers / m;
This is a valid distribution approach, but the cast from double to i_t could truncate unexpectedly for large values. Consider using integer arithmetic for deterministic rounding:
i_t start = (k * diving_workers) / m;
i_t end   = ((k + 1) * diving_workers) / m;
This is a minor concern since worker counts are typically small.
cpp/include/cuopt/linear_programming/mip/solver_settings.hpp (1)
86-88: New solver setting added correctly.

The reliability_branching flag is added with a sensible default (true). This allows users to disable the feature when needed (e.g., for debugging or specific problem types).

Consider adding a brief doc comment describing what reliability branching does and when users might want to disable it:
/// Enable reliability branching for variable selection in B&B (default: true)
bool reliability_branching = true;
cpp/src/mip/local_search/local_search.cu (1)
127-130: Consider replacing cudaDeviceSynchronize with stream sync.

cudaDeviceSynchronize() blocks all GPU work across all streams. Since the intent appears to be ensuring the solution_lp copy is complete before spawning the CPU task, a stream-specific sync would be less blocking:
context.problem_ptr->handle_ptr->sync_stream();
This aligns with coding guidelines to minimize unnecessary host-device synchronization.
cpp/src/mip/solver.cu (1)

157-162: Thread allocation may leave B&B under-resourced on low-core systems.

With num_threads - 3 for B&B, systems with 4 or fewer cores will have B&B limited to 1 thread. This is noted in the PR description but worth documenting in code comments for future maintainers.

Also applies to: 176-176
cpp/src/dual_simplex/branch_and_bound.cpp (1)
1145-1208: Single-threaded solve for SubMIP/RINS.

The single_threaded_solve method mirrors the scheduler loop structure but uses a single worker without task spawning. This avoids OpenMP overhead for small sub-problems.

Dead code on Line 1163: launched_any_task is declared but never used in this method. Consider removing it.
🧹 Remove unused variable
  while (solver_status_ == mip_status_t::UNSET && abs_gap > settings_.absolute_mip_gap_tol &&
         rel_gap > settings_.relative_mip_gap_tol && node_queue_.best_first_queue_size() > 0) {
-   bool launched_any_task = false;
    lower_bound            = get_lower_bound();

cpp/src/dual_simplex/bnb_worker.hpp

cpp/src/dual_simplex/pseudo_costs.cpp

cpp/src/mip/local_search/local_search.cu

cpp/src/mip/solver.cu

nguidotti and others added 30 commits December 12, 2025 10:22

implemented diving heuristics. sorted starting nodes for diving based…

c9081a6

… on the objective pseudcost estimate.

moved diving heuristics to a separated file

046a501

organized code. added toggle to disable each type of diving.

6ea6d72

restrict calling RINS to the best-first threads

5422b97

fix invalid branch var in line search diving

73c1a63

moved asserts

3a77cca

replace inf and max with STL calls

0f7af4e

Fix incorrect infeasible list

79368c3

implemented diving heuristics. sorted starting nodes for diving based…

6334ad7

… on the objective pseudcost estimate.

moved diving heuristics to a separated file

2c94a7c

organized code. added toggle to disable each type of diving.

0e815e1

moved asserts

29a2a33

unified global node heap. fathom node in the diving node that was alr…

5a3ef60

…eady explored.

refactoring code

c181ccf

fix style

501c20d

small fixes

0ee757a

unified the best-first and diving heap into a single object. imposed …

546b116

…iteration and node limit to the diving threads.

adjusted column spacing in bnb logs. added opening mode for logger.

06d531a

revert fixes for dual simplex. changed RINS and SubMIP to use guided …

ae742c2

…diving

moved bound propagation logs to debug mode

b5f2c7e

addressing code rabbit suggestions

eb5c695

added explicit conversion to float

5cf5ac0

missing code revert in basis update

d7dfb0d

fixed variable type

55e64cc

Merge remote-tracking branch 'origin/main' into node-queue

a26a816

Merge remote-tracking branch 'origin/main' into diving-heuristics

17d3d7c

added comments

0b1e994

added missing spacing

9effdc8

updated logs

6d43e03

Merge branch 'main' into adjust-bnb-logs

98f670f

nguidotti added 15 commits January 27, 2026 17:04

added missing mutexes

eb51080

fixed empty vector in shuffle

f3567cb

reverted some code changes

233933f

replaced with lock_guards

527754c

fixed crash

2251b87

fixed number of threads set to 0

7919288

enable reliablity branching by default

cec6f38

changed solver to use a single parallel section for the entire duration.

2fdbe27

converted RINS to use openmp task

87e4be2

migrated CPU FJ to OpenMP

eee6848

fix compilation

3699fd6

fix incorrect execution for single thread

ee0d49f

fixed missing barrier at the end of the scheduler

e8b5109

replaced locks with atomics

e8f8148

replaced std::thread with openmp task for the concurrent root solve

c9a27a7

nguidotti self-assigned this Feb 2, 2026

nguidotti added non-breaking Introduces a non-breaking change do not merge Do not merge if this flag is set improvement Improves an existing functionality mip labels Feb 2, 2026

nguidotti changed the title ~~[WIP] Unified threading model in MIP solver~~ [WIP, POC] Unified threading model in MIP solver Feb 2, 2026

fixed incorrectly declaring the root relaxation to be optimal after t…

9731c76

…imeout

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

nguidotti added 2 commits February 3, 2026 13:24

small fixes

62f604f

fix pseudocost update

a8fd737

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP, POC] Unified threading model in MIP solver#820

[WIP, POC] Unified threading model in MIP solver#820
nguidotti wants to merge 141 commits intoNVIDIA:mainfrom
nguidotti:unified-parallel-model

nguidotti commented Feb 2, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 2, 2026

Uh oh!

nguidotti commented Feb 3, 2026

Uh oh!

nguidotti commented Feb 3, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nguidotti commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 2, 2026

Uh oh!

nguidotti commented Feb 3, 2026

Uh oh!

nguidotti commented Feb 3, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026

Uh oh!

coderabbitai bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nguidotti commented Feb 2, 2026 •

edited

Loading

coderabbitai bot commented Feb 3, 2026 •

edited

Loading