Conversation
|
/ok to test ff47875 |
There was a problem hiding this comment.
Actionable comments posted: 12
🤖 Fix all issues with AI agents
In `@cpp/src/dual_simplex/bb_event.hpp`:
- Around line 71-129: operator< uses event_sequence as a final tie-breaker but
none of the factory functions set it; update all factory methods (make_branched,
make_integer_solution, make_fathomed, make_infeasible, make_numerical) to accept
an extra event_sequence parameter (e.g., uint64_t event_sequence) and assign it
to e.event_sequence before returning, ensuring deterministic ordering;
alternatively, if caller-side sequence management isn't desired, have each
factory pull a monotonic counter (e.g., next_event_sequence()) and set
e.event_sequence from that source — but make the change consistently across all
listed factory functions so event_sequence is never left at its default.
In `@cpp/src/dual_simplex/bounds_strengthening.cpp`:
- Around line 105-110: A_i is declared but never used while the code still
accesses A.i directly, causing an unused-variable warning and inconsistent nnz
accounting; fix this by replacing every occurrence of the instrumented access
A.i[p] with the cached reference A_i[p] (so A_i, A_x, Arow_j, Arow_x are used
consistently) in the column loop and other places where A.i is read, ensuring
nnz_processed remains the single source of nonzero counting; alternatively, if
you prefer to keep instrumented accesses, remove the unused A_i declaration
instead.
In `@cpp/src/dual_simplex/branch_and_bound.cpp`:
- Around line 666-675: The code unconditionally calls
settings_.heuristic_preemption_callback() and then calls it again in later
branches, causing duplicate preemption notifications; add a boolean guard (e.g.,
bool preemption_called = false) near the top of the function and replace direct
calls to settings_.heuristic_preemption_callback() with a conditional that
checks and sets preemption_called so the callback is invoked at most once
(update all sites that currently call settings_.heuristic_preemption_callback to
use this guard); reference the variables/members solver_status_,
mip_status_t::WORK_LIMIT, mip_status_t::NODE_LIMIT and
settings_.heuristic_preemption_callback when locating and updating the logic.
- Around line 481-520: The call-site in local_search.cu should validate the work
unit timestamp before calling
branch_and_bound_t::queue_external_solution_deterministic to avoid silently
dropping old submissions; modify the producer/registration code around where
producer_sync_ and work_units are used to check work_units against
bb.get_work_unit_context().current_horizon() and only call
queue_external_solution_deterministic(h_vec, work_units) when work_units >=
current_horizon(), otherwise log a warning (or otherwise handle) indicating the
solution was from the past; reference the symbols producer_sync_, work_units,
bb.get_work_unit_context().current_horizon(), and
queue_external_solution_deterministic to locate where to insert the check and
warning.
- Around line 2363-2388: After the deterministic coordinator returns you must
propagate its termination code into the solver state: inside the
settings_.deterministic branch (after run_determinism_coordinator(Arow_)) assign
solver_status_ = determinism_global_termination_status_; so that
set_final_solution(...) sees TIME_LIMIT/WORK_LIMIT set by the coordinator;
update the same pattern in the other determinism-related exit paths referenced
(around lines 2668-2786) where the coordinator/sync callback can set
determinism_global_termination_status_.
In `@cpp/src/dual_simplex/phase2.cpp`:
- Line 3374: The local variable iters_elapsed (computed as "i_t iters_elapsed =
iter - last_feature_log_iter;") is never used and causes -Wunused-variable
warnings; either remove this declaration or restore its use (e.g., re-enable the
logging/feature code that consumed it) or explicitly mark it as used (cast to
void) to suppress the warning; update the code around the calculation in
phase2.cpp where iter and last_feature_log_iter are available to apply one of
these fixes.
- Around line 2658-2684: The code uses a magic divisor 1e8 when converting
aggregated byte loads/stores before calling work_unit_context->record_work
inside the cuopt::scope_guard lambda and again elsewhere; extract this value
into a named constant (e.g., constexpr double kWorkUnitDivisor = 1e8) near the
top of the translation unit, create a helper function
record_aggregated_work(WorkUnitContext*, std::pair<uint64_t,uint64_t>
loads_stores) that calls aggregator.collect_and_flush() (or accepts its result)
and performs the division + work_unit_context->record_work, then replace the
duplicate logic in the scope_guard and the periodic logging block with calls to
record_aggregated_work(work_unit_context, {total_loads,total_stores}) (or pass
the collected pair); also remove the "// TEMP;" and the large commented-out
feature-logging block or replace them with a single TODO referencing the
tracking issue to keep the diff clean.
In `@cpp/src/mip/feasibility_jump/fj_cpu.cu`:
- Around line 270-281: The logging calculates hit percentages using divisions
like (double)fj_cpu.hit_count / (fj_cpu.hit_count + fj_cpu.miss_count) which can
divide by zero; update each CUOPT_LOG_TRACE call (the overall cache line and the
three candidate move lines referencing fj_cpu.hit_count, fj_cpu.miss_count,
fj_cpu.candidate_move_hits[i], and fj_cpu.candidate_move_misses[i]) to guard the
denominator (e.g., use a ternary or if to compute percentage = sum==0 ? 0.0 :
(double)hits / sum * 100.0) so that when hits+misses == 0 the log prints 0.00%
instead of NaN.
In `@cpp/src/utilities/memory_instrumentation.hpp`:
- Line 37: The macro CUOPT_ENABLE_MEMORY_INSTRUMENTATION is hardcoded to 1
causing all ins_vector accesses to route through element_proxy_t and increment
byte_loads/byte_stores; change this to be opt-in by setting the default to 0 in
memory_instrumentation.hpp and wire a CMake option to control it (e.g., add a
CUOPT_ENABLE_MEMORY_INSTRUMENTATION cache option that defaults to OFF and sets
-DCUOPT_ENABLE_MEMORY_INSTRUMENTATION=1 for instrumented builds). Update the
header to `#ifndef` CUOPT_ENABLE_MEMORY_INSTRUMENTATION / `#define`
CUOPT_ENABLE_MEMORY_INSTRUMENTATION 0 / `#endif` so code uses the macro only when
explicitly enabled, and document how to enable it for deterministic work-unit
prediction (affecting element_proxy_t, ins_vector, and the
byte_loads/byte_stores counters).
In `@cpp/src/utilities/work_unit_scheduler.cpp`:
- Around line 67-71: The current loop where each thread independently calls
wait_at_sync_point while (total_work >= current_sync_target() && !is_shutdown())
can deadlock because threads may call a different number of barriers; change the
logic so all threads perform at most one barrier per sync generation and
re-evaluate together after the barrier: replace the inner while with a single
conditional check (if total_work >= current_sync_target() && !is_shutdown())
that calls wait_at_sync_point(ctx, current_sync_target()), and ensure
wait_at_sync_point (or the surrounding scheduler) advances a shared sync
generation/next_target after the barrier so every thread recomputes
current_sync_target() and can decide whether another sync is needed; reference
functions/variables: total_work, current_sync_target(), wait_at_sync_point,
is_shutdown to locate and update the logic.
- Line 53: The setter set_sync_interval writes sync_interval_ unsafely while
current_sync_target (used by on_work_recorded across OpenMP threads) reads it,
causing a data race; make sync_interval_ a std::atomic<double> and update all
accesses to use .store() in set_sync_interval and .load() where
current_sync_target/on_work_recorded read it (or protect reads/writes with a
mutex), so concurrent reads/writes are synchronized; ensure the declaration of
sync_interval_ is changed to std::atomic<double> and replace direct reads/writes
with atomic load/store calls.
In `@cpp/tests/mip/determinism_test.cu`:
- Around line 155-173: The test
DeterministicBBTest::reproducible_solution_vector fails to set settings.seed
(unlike other tests), so the RNG may differ across runs; update the
mip_solver_settings_t<int,double> settings instance in that test to set
settings.seed to a fixed value (the same constant used elsewhere in tests)
before calling solve_mip(&handle_, problem, settings) twice so both solves use
identical RNG initialization and produce comparable results.
🧹 Nitpick comments (20)
cpp/src/dual_simplex/sparse_vector.cpp (1)
105-134: Consider templatingto_dense/scatterto avoid duplication.The
ins_vectoroverloads are exact copies of thestd::vectorversions (lines 94–124). A single templated overload would eliminate the duplication:template <typename Container> void to_dense(Container& x_dense) const { /* ... */ }Not blocking since the wrapper is designed to be interchangeable, but worth considering if more container-type overloads are anticipated. As per coding guidelines, refactor code duplication in solver components into shared utilities.
cpp/src/dual_simplex/pseudo_costs.hpp (1)
25-33: Theunderlying_typetrait is clever but somewhat obscure.Extracting the inner type via
decltype(std::declval<T&>()++)relies on the fact thatomp_atomic_t<T>::operator++(int)returnsT. A brief doc comment explaining why post-increment is used (to unwrapomp_atomic_t) would help future readers.cpp/src/utilities/memory_instrumentation.hpp (3)
39-43:HDImacro name is short and collision-prone.
HDIis a very short, unprefixed macro name defined at global scope. If any other header (CUDA helpers, third-party libraries) also definesHDI, it will silently conflict. Consider a namespaced name likeCUOPT_HDIor use#ifndef HDIguard.♻️ Suggested fix
-#ifdef __NVCC__ -#define HDI inline __host__ __device__ -#else -#define HDI inline -#endif +#ifndef CUOPT_HDI + `#ifdef` __NVCC__ + `#define` CUOPT_HDI inline __host__ __device__ + `#else` + `#define` CUOPT_HDI inline + `#endif` +#endifThen replace all uses of
HDIwithCUOPT_HDIin this file.
762-763: Implicit conversion toT&bypasses all instrumentation.
operator T&()andoperator const T&() constallow any code acceptingstd::vector<T>&to silently bypass the instrumented accessors. This is convenient for backward compatibility but means instrumentation counters will undercount in any code path that passes anins_vectorto a function expectingstd::vector&.If this is intentional for the transition period, a brief comment would help future readers understand the trade-off.
282-283:standard_layoutassertion may reject valid use cases.
static_assert(std::is_standard_layout_v<value_type>, ...)will rejectins_vector<std::string>or other non-trivial types. Currently the codebase only wraps numeric types and simple pairs, so this is fine, but the assertion message should clarify why standard layout is required (forsizeof-based byte counting).cpp/src/dual_simplex/diving_heuristics.cpp (2)
74-81: Unusedlogparameter will cause compiler warnings.After delegating to
pseudocost_diving_from_arrays, thelogparameter is no longer referenced. Same applies toguided_divingbelow (lines 91–98). Silence with(void)log;or mark[[maybe_unused]].Suggested fix
{ + (void)log; return pseudocost_diving_from_arrays((const f_t*)pc.pseudo_cost_sum_down.data(),
91-98: Same unusedlogparameter issue.Suggested fix
{ + (void)log; return guided_diving_from_arrays((const f_t*)pc.pseudo_cost_sum_down.data(),cpp/tests/linear_programming/c_api_tests/c_api_test.c (1)
1512-1513: Consider a fixed seed instead of unseededrand().
rand()without a priorsrand()call uses the implementation-default seed (typically 1), so the "random" seed is actually the same constant on every test execution. Additionally,rand()is not thread-safe. Since any fixed value works for a determinism test, a literal constant (or asrand(time(NULL))call if variety across runs is desired) would be clearer and more portable.cpp/src/mip/solver_context.cuh (1)
62-63: Hardcoded 5.0-second sync interval.The
work_unit_scheduler_interval is a compile-time constant. Consider making it configurable viamip_solver_settings_tif tuning is expected across different problem profiles. Fine for now if this is an internal tuning constant.cpp/src/utilities/work_limit_context.hpp (2)
19-19: Unused#include <algorithm>.This header doesn't use anything from
<algorithm>. The static analysis tool also flagged this. Removing it will clean up the dependency.Proposed fix
-#include <algorithm> `#include` <string>
36-36: Consider marking the constructorexplicit.Single-argument constructors should generally be
explicitto prevent unintended implicit conversions fromstd::string.Proposed fix
- work_limit_context_t(const std::string& name) : name(name) {} + explicit work_limit_context_t(const std::string& name) : name(name) {}cpp/src/mip/local_search/local_search.cu (1)
58-59: Static globalslocal_search_best_objandpop_ptrleak state between test runs.These file-scope statics persist across test cases, which can cause non-deterministic test behavior.
local_search_best_objis never reset, so its value from a prior test case carries over.pop_ptris assigned but never appears to be read. Consider resettinglocal_search_best_objat the start of each solve, or moving it to instance scope. As per coding guidelines, test isolation requires preventing global variables from leaking between test cases.cpp/tests/mip/determinism_test.cu (1)
224-234: Comment on line 225 is outdated — missingwork_limitfield.The comment says
// Instance, threads, time_limitbut the tuple actually has four fields includingwork_limit.Proposed fix
- // Instance, threads, time_limit + // Instance, threads, time_limit, work_limitcpp/src/mip/problem/problem.cuh (1)
13-14: Duplicate#include "host_helper.cuh".
host_helper.cuhis included at both Line 13 and Line 22. The second include is redundant.Proposed fix
-#include "host_helper.cuh" -#include "problem_fixing.cuh" +#include "problem_fixing.cuh"Also applies to: 22-23
cpp/src/mip/feasibility_jump/fj_cpu.cu (1)
99-109: Dead variablenew_valinget_mtm_for_constraint.
new_valis assigned at Line 102 (f_t new_val = old_val;) but is never read before thebreakat Line 105, and it is not part of the return tuple. This appears to be leftover from a refactor.Proposed fix
if (move_type == MTMMoveType::FJ_MTM_VIOLATED ? !violated : violated) continue; - f_t new_val = old_val; - delta_ij = slack / (cstr_coeff * sign); break;cpp/src/dual_simplex/deterministic_workers.hpp (1)
317-447: Skip enqueue when bounds strengthening reports infeasible.
enqueue_dive_nodeignores the return value ofbounds_strengthening, so infeasible nodes are still queued. Consider early‑returning to avoid wasted dives.♻️ Suggested guard
- this->node_presolver.bounds_strengthening( - settings, bounds_changed, entry.resolved_lower, entry.resolved_upper); + if (!this->node_presolver.bounds_strengthening( + settings, bounds_changed, entry.resolved_lower, entry.resolved_upper)) { + return; + }cpp/src/dual_simplex/phase2.cpp (4)
26-35: NVTX ranges are unconditionally enabled in a hot path.
PHASE2_NVTX_RANGESis always#defined, so every simplex iteration incurs NVTX push/pop overhead. Consider gating this behind a CMake option orNDEBUGso that release builds avoid the cost.-#define PHASE2_NVTX_RANGES +// `#define` PHASE2_NVTX_RANGES // Enable for profiling buildsAs per coding guidelines, "Assess algorithmic complexity for large-scale problems … ensure O(n log n) or better complexity" — while individual NVTX calls are O(1), they add measurable overhead when called millions of times in the main simplex loop.
3371-3399: Redundant null check and dead code.Line 3393
if (work_unit_context)is already guaranteed true by the outer condition on line 3373. The ~15 lines of commented-out feature logging should be removed or replaced with a TODO linking to a tracking issue.Minimal cleanup
if ((iter % FEATURE_LOG_INTERVAL) == 0 && work_unit_context) { i_t iters_elapsed = iter - last_feature_log_iter; auto [total_loads, total_stores] = aggregator.collect_and_flush(); - // features.byte_loads = total_loads; - // features.byte_stores = total_stores; - - // features.interval_runtime = now - interval_start_time; - // interval_start_time = now; - - // features.iteration = iter; - // features.num_refactors = num_refactors; - // features.num_basis_updates = ft.num_updates(); - // features.sparse_delta_z_count = sparse_delta_z; - // features.dense_delta_z_count = dense_delta_z; - // features.total_bound_flips = total_bound_flips; - // features.num_infeasibilities = infeasibility_indices.size(); - // features.delta_y_nz_percentage = delta_y_nz_percentage; - // features.log_features(settings); - - if (work_unit_context) { - // TEMP; - work_unit_context->record_work((total_loads + total_stores) / 1e8); - } + // TODO(aliceb-nv): Re-enable feature logging for regression training + work_unit_context->record_work((total_loads + total_stores) / BYTES_TO_WORK_UNITS); last_feature_log_iter = iter; }
3224-3248: Mixedins_vector/std::vector&API surface.
update_primal_infeasibilitiesandupdate_single_primal_infeasibilitystill acceptstd::vector&, requiring.underlying()at every call site. This is fine as an incremental migration step, but consider migrating these two functions toins_vectorparameters in a follow-up to eliminate the.underlying()calls and get consistent memory tracking.
2631-2647: Consider a helper to register sparse vector components.Each
sparse_vector_trequires twoaggregator.add()calls (.iand.x). With 8 sparse vectors, that's 16 lines of boilerplate. A small helper likeaggregator.add_sparse("name", sv)that registers both.iand.xwould halve the lines and reduce the risk of missing a component.
|
/ok to test 57129b7 |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@cpp/src/dual_simplex/branch_and_bound.cpp`:
- Around line 2377-2389: The assignment to solver_status_ should only occur in
deterministic runs; update the branch in the code around
determinism_mode_enabled_ so that solver_status_ =
determinism_global_termination_status_ is executed only when
determinism_mode_enabled_ is true (leave solver_status_ unchanged in the
else/non-deterministic path so existing TIME_LIMIT/NODE_LIMIT statuses are
preserved); locate the block that computes lower_bound and the subsequent
solver_status_ assignment (references: determinism_mode_enabled_,
determinism_global_termination_status_, solver_status_, set_final_solution) and
wrap or move the assignment into the deterministic branch accordingly.
In `@cpp/src/dual_simplex/phase2.cpp`:
- Around line 2655-2656: The local variable interval_start_time (assigned via
toc(start_time)) is unused and will trigger -Wunused-variable; mark it with
[[maybe_unused]] or remove it to silence the warning. Update the declaration of
interval_start_time near last_feature_log_iter/iter to either add the attribute
(e.g., [[maybe_unused]] f_t interval_start_time = toc(start_time);) or delete
the variable if the commented-out uses will not be restored; ensure consistency
with the existing iters_elapsed [[maybe_unused]] style and keep
last_feature_log_iter and surrounding logic unchanged.
- Around line 2896-2913: The bound_flipping_ratio_test_t constructor call is
passing nonbasic_mark (an ins_vector<i_t>) directly while other ins_vector
arguments use .underlying(); change the argument to nonbasic_mark.underlying()
in the bound_flipping_ratio_test_t(...) invocation so the parameter type matches
the expected const std::vector<i_t>&; update the call site around the creation
of bfrt (bound_flipping_ratio_test_t<i_t,f_t> bfrt(...)) to pass
nonbasic_mark.underlying() instead of nonbasic_mark.
🧹 Nitpick comments (1)
cpp/src/dual_simplex/phase2.cpp (1)
3373-3399: Redundant null-check onwork_unit_contextat line 3393.The outer condition at line 3373 already gates on
work_unit_contextbeing non-null, making the inner check at line 3393 redundant.Suggested simplification
if ((iter % FEATURE_LOG_INTERVAL) == 0 && work_unit_context) { [[maybe_unused]] i_t iters_elapsed = iter - last_feature_log_iter; auto [total_loads, total_stores] = aggregator.collect_and_flush(); // ... - if (work_unit_context) { - // TEMP; - work_unit_context->record_work((total_loads + total_stores) / 1e8); - } + // TEMP; + work_unit_context->record_work((total_loads + total_stores) / 1e8); last_feature_log_iter = iter; }
|
/ok to test 1332fbd |
This PR introduces a deterministic execution mode for the parallel branch-and-bound MIP solver.
When
determinism_mode = CUOPT_MODE_DETERMINISTIC, the solver guarantees bitwise-identical results across runs regardless of thread scheduling variations, so long as the runs are on the same platform with the same environment (= same glibc, etc...)The approach that was chosen is referred to as Bulk Synchronous Parallel (BSP) in the literature. This is an execution model where computation proceeds in discrete horizons of virtual time (=work units) separated by synchronization barriers.
To cope with the inherent nondeterminism of wall-clock-time due to factors such as caching state, OS scheduling, CPU throttling...
instead of time, progress is measured in "work units" approximating the execution time of an algorithm using known and deterministic factors, such as memory operations, problem features and sparsity...
Workers explore the tree independently and locally within a horizon, and collect events (branching decisions, solutions, pseudo-cost updates) during the process without exchanging them yet.
At each horizon boundary, events are sorted by work-unit timestamp with tie-breaking, history is replayed in deterministic order to update global state, and if need be nodes are redistributed across workers to balance load (which replaces the ramp-up mechanism).
Workers operate on local snapshots of shared state (pseudo-costs, upper bound, LP iteration count) taken at horizon start to avoid read races. This trades some accuracy in decisions (pruning may rely on a higher lower bound than in nondeterminsitic mode), for determinism.
Support for start CPU heuristics is present, based on a "producer" model: CPUFJ starts immediately and begins producing solutions with work unit timestamps, that are produced by the B&B thread once the appropriate time is reached. A form of one-way synchronization is implemented to prevent the CPUFJ thread from "falling behind" and producing solutiosn in the past from the perspective of the B&B thread. CPUFJ is allowed to run ahead, and is biased to run faster than B&B to avoid unnecessary syncs.
B&B BFS workers, diving workers, and a start CPUFJ thread are supported in deterministic mode. GPU Heuristics are disabled in this mode and will be incorporated into the deterministic framework in the future.
Note: files in the
utilities/models/folders are auto-generated from the LightGBM models trained to predict work units from algorithm features.The implementation is described in more detail in this document: B&B Determinism
Benchmark results
B&B alone, 10min
Regression testing again main branch:
With changes:
Baseline:
Summary by CodeRabbit
New Features
Infrastructure
Tests