[WIP, POC] Unified threading model in MIP solver#820
[WIP, POC] Unified threading model in MIP solver#820nguidotti wants to merge 141 commits intoNVIDIA:mainfrom
Conversation
… on the objective pseudcost estimate.
… on the objective pseudcost estimate.
…iteration and node limit to the diving threads.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test 9731c76 |
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThis pull request introduces reliability branching for MIP solving and refactors the branch-and-bound solver from a direct threading model to an OpenMP-based worker pool architecture. Key changes include a new branch-and-bound worker framework, modified solver settings, updated heuristics integration (RINS, Local Search, Feasibility Jump), and removal of legacy CPU worker thread infrastructure. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
cpp/src/dual_simplex/solution.hpp (1)
54-60:⚠️ Potential issue | 🔴 CriticalUninitialized members:
nodes_exploredandsimplex_iterations.These fields are not initialized in the constructor's initializer list. Reading them before assignment is undefined behavior. Other members (
objective,lower_bound,has_incumbent) are properly initialized, but these two are missing.As per coding guidelines: "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving."
🐛 Proposed fix to initialize the counters
mip_solution_t(i_t n) : x(n), objective(std::numeric_limits<f_t>::quiet_NaN()), lower_bound(-inf), - has_incumbent(false) + has_incumbent(false), + nodes_explored(0), + simplex_iterations(0) { }Also applies to: 75-76
cpp/src/mip/feasibility_jump/fj_cpu.cu (1)
1041-1046:⚠️ Potential issue | 🟡 MinorPotential division by zero if loop exits before any iteration.
If
preemption_flagis set before the first iteration completes,fj_cpu.iterationsremains 0, causing division by zero at line 1043.Proposed fix
auto loop_end = std::chrono::high_resolution_clock::now(); double total_time = std::chrono::duration_cast<std::chrono::duration<double>>(loop_end - loop_start).count(); - double avg_time_per_iter = total_time / fj_cpu.iterations; + double avg_time_per_iter = fj_cpu.iterations > 0 ? total_time / fj_cpu.iterations : 0.0; CUOPT_LOG_TRACE("%sCPUFJ Average time per iteration: %.8fms\n", fj_cpu.log_prefix.c_str(), avg_time_per_iter * 1000.0);cpp/src/mip/local_search/local_search.cu (1)
50-51:⚠️ Potential issue | 🟠 MajorThread-unsafe static globals pose data race risk.
local_search_best_objandpop_ptrare static file-scope variables accessed and modified by multiple OpenMP tasks without synchronization. This can lead to data races when multiple local search instances run concurrently or when tasks from different threads access these variables simultaneously.Consider converting these to member variables of
local_search_tor using thread-local storage if the intent is per-thread state.
🤖 Fix all issues with AI agents
In `@cpp/src/dual_simplex/bnb_worker.hpp`:
- Around line 182-215: The split API creates a TOCTOU race: callers could call
get_idle_worker() then another thread could pop that worker before
pop_idle_worker() is called—use the atomic operation get_and_pop_idle_worker()
instead; remove or make get_idle_worker() and pop_idle_worker() private (or
delete them) so external code cannot call the unsafe pair, and update any
callers to use get_and_pop_idle_worker() which already locks mutex_ and updates
idle_workers_ and num_idle_workers_ while returning workers_[idx].get().
In `@cpp/src/dual_simplex/pseudo_costs.cpp`:
- Line 417: The computation of bnb_lp_iter_per_node divides bnb_total_lp_iter by
bnb_stats.nodes_explored which can be zero; update the calculation used to
produce bnb_lp_iter_per_node (and any dependent uses) to guard against
division-by-zero by checking bnb_stats.nodes_explored first and using a safe
fallback (e.g., 0 or treat denominator as 1) — replace the direct division of
bnb_total_lp_iter / bnb_stats.nodes_explored with a conditional/ternary that
yields a defined value when bnb_stats.nodes_explored == 0 so functions/variables
like bnb_lp_iter_per_node are never computed with a zero divisor.
- Around line 387-568: In reliable_variable_selection, the trial-branch updates
divide by change_in_x (for down/up) which can be near zero and yield huge/NaN
pseudo-costs; before updating pseudo_cost_sum_down/up and incrementing
pseudo_cost_num_down/up, guard the division by checking fabs(change_in_x)
against a small tolerance (use existing integer_tol if available or a local eps
like 1e-8): if below tol, skip the update (or clamp denominator to tol) and do
not increment the count; apply the same guard for both the down-branch block
(where change_in_x = solution[j] - floor(solution[j])) and the up-branch block
(ceil(solution[j]) - solution[j]) to ensure stable pseudo-cost_sum_* and
pseudo_cost_num_* updates.
In `@cpp/src/mip/local_search/local_search.cu`:
- Around line 139-140: The code dereferences scratch_cpu_fj_on_lp_opt without
checking for null in the stop_cpufj_scratch_threads path; add a null check (or
guard) before using scratch_cpu_fj_on_lp_opt in stop_cpufj_scratch_threads so
you only set halted when scratch_cpu_fj_on_lp_opt is non-null, or ensure
start_cpufj_lptopt_scratch_threads fully initializes scratch_cpu_fj_on_lp_opt
before stop can run; reference the scratch_cpu_fj_on_lp_opt variable and the
stop_cpufj_scratch_threads / start_cpufj_lptopt_scratch_threads functions when
applying the guard.
In `@cpp/src/mip/solver.cu`:
- Line 150: The variable bb_status (type dual_simplex::mip_status_t) must be
initialized at its declaration to avoid undefined behavior if the OpenMP task
that writes it never runs (e.g., heuristics_only path); change its declaration
to value-initialize (for example: dual_simplex::mip_status_t bb_status = {} or
an explicit safe enum like dual_simplex::mip_status_t::UNKNOWN) and keep the
task assignment as-is so reads after the parallel region see a defined value.
🧹 Nitpick comments (5)
cpp/src/dual_simplex/bnb_worker.hpp (1)
269-280: Integer division for worker distribution may produce uneven allocation.Lines 276-277 use floating-point division then truncate to
i_t:i_t start = (double)k * diving_workers / m; i_t end = (double)(k + 1) * diving_workers / m;This is a valid distribution approach, but the cast from
doubletoi_tcould truncate unexpectedly for large values. Consider using integer arithmetic for deterministic rounding:i_t start = (k * diving_workers) / m; i_t end = ((k + 1) * diving_workers) / m;This is a minor concern since worker counts are typically small.
cpp/include/cuopt/linear_programming/mip/solver_settings.hpp (1)
86-88: New solver setting added correctly.The
reliability_branchingflag is added with a sensible default (true). This allows users to disable the feature when needed (e.g., for debugging or specific problem types).Consider adding a brief doc comment describing what reliability branching does and when users might want to disable it:
/// Enable reliability branching for variable selection in B&B (default: true) bool reliability_branching = true;cpp/src/mip/local_search/local_search.cu (1)
127-130: Consider replacingcudaDeviceSynchronizewith stream sync.
cudaDeviceSynchronize()blocks all GPU work across all streams. Since the intent appears to be ensuring thesolution_lpcopy is complete before spawning the CPU task, a stream-specific sync would be less blocking:context.problem_ptr->handle_ptr->sync_stream();This aligns with coding guidelines to minimize unnecessary host-device synchronization.
cpp/src/mip/solver.cu (1)
157-162: Thread allocation may leave B&B under-resourced on low-core systems.With
num_threads - 3for B&B, systems with 4 or fewer cores will have B&B limited to 1 thread. This is noted in the PR description but worth documenting in code comments for future maintainers.Also applies to: 176-176
cpp/src/dual_simplex/branch_and_bound.cpp (1)
1145-1208: Single-threaded solve for SubMIP/RINS.The
single_threaded_solvemethod mirrors the scheduler loop structure but uses a single worker without task spawning. This avoids OpenMP overhead for small sub-problems.Dead code on Line 1163:
launched_any_taskis declared but never used in this method. Consider removing it.🧹 Remove unused variable
while (solver_status_ == mip_status_t::UNSET && abs_gap > settings_.absolute_mip_gap_tol && rel_gap > settings_.relative_mip_gap_tol && node_queue_.best_first_queue_size() > 0) { - bool launched_any_task = false; lower_bound = get_lower_bound();
This PR unifies the threading model across the MIP solver, such that the code now uses the tasking model from OpenMP. The only exception is the Papilo presolver that uses Intel TBB.
More specifically, this PR
std::thread. This alsostd::threadtoomp task. Similar to previous logic, one instance of RINS can run at a time.CPU FJfromstd::threadtoomp task. There are a few limitationsscratch_cpu_fj_on_lp_optandscratch_cpu_fjare running for the entire program. This essentially allocate two dedicated threads to these functions, while other routines needs to share the remaining CPU resources. This may hurt the performance for low core count CPUs.pseudocostwith atomics to reduce contention.This is a POC, hence it requires some polish before it can be merged.
Checklist
Summary by CodeRabbit
New Features
--reliability-branchingCLI option to control advanced branching strategy in MIP solver (enabled by default)Improvements
Refactoring