Skip to content

Comments

Add memory model support for host-resident problem data#819

Open
tmckayus wants to merge 17 commits intoNVIDIA:mainfrom
tmckayus:memory-model-clean
Open

Add memory model support for host-resident problem data#819
tmckayus wants to merge 17 commits intoNVIDIA:mainfrom
tmckayus:memory-model-clean

Conversation

@tmckayus
Copy link
Contributor

@tmckayus tmckayus commented Jan 31, 2026

NOTE: Remote solve is stubbed in this commit and returns dummy solutions. Full gRPC implementation will be added in a follow-up PR.

This change adds problem and solution class hierarchies which allow data to be in GPU structures or host structures throughout. This is foundational to supporting remote execution integrated at the solver level.

The C API and Python API are unchanged.

If remote execution is enabled, the various APIs will construct a problem in host memory and CUDA initialization will be avoided. Remote solving is stubbed out in this change, and will return dummy values. When remote solving is done, solutions will be returned in host memory.

Remote execution is enabled by setting the following env vars:
CUOPT_REMOTE_HOST=somehost
CUOPT_REMOTE_PORT=1234

MEMORY_MODEL_ARCHITECTURE.md

Summary by CodeRabbit

  • New Features

    • CPU-only execution support enables solving on systems without GPU access
    • Enhanced warm-start data handling for improved solver performance
    • Backend-aware memory management for CPU and GPU execution modes
  • Tests

    • Added CPU-only execution test coverage for linear and mixed-integer problems

@tmckayus tmckayus requested review from a team as code owners January 31, 2026 02:03
@tmckayus
Copy link
Contributor Author

this is a cleaner version of #807

@tmckayus tmckayus added feature request New feature or request non-breaking Introduces a non-breaking change P0 labels Jan 31, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 31, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces an interface-based architecture for optimization problems and solutions supporting both CPU and GPU memory backends. Replaces direct object usage with abstract interfaces for both problems and solutions, enabling backend-agnostic solving, remote execution support, and host-memory representations for CPU-only scenarios. Includes new solution conversion pathways and unified solve entry points.

Changes

Cohort / File(s) Summary
Interface layer
cpp/include/cuopt/linear_programming/optimization_problem_interface.hpp, optimization_problem_solution_interface.hpp, optimization_problem_utils.hpp
Defines abstract interfaces for optimization problems and solutions with virtual setters/getters, host-memory accessors, backend-aware implementation selection, and utilities to populate interfaces from MPS data models or data model views.
GPU solution implementations
cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp
Implements GPU-backed LP and MIP solution wrappers around device-resident optimization problems with lazy host-cached data access, conversion to Python return types, and warm-start data handling.
CPU solution implementations
cpp/include/cuopt/linear_programming/cpu_optimization_problem_solution.hpp, cpu_pdlp_warm_start_data.hpp
Introduces CPU-backed LP and MIP solution classes using std::vector storage with host accessors, GPU conversion pathways, and CPU-side PDLP warm-start data representation with GPU↔CPU conversion utilities.
Backend infrastructure
cpp/src/pdlp/backend_selection.cpp, cpp/src/pdlp/gpu_optimization_problem.cu, cpp/src/pdlp/cpu_optimization_problem.cpp
Implements backend selection logic based on environment variables, GPU-backed problem representation with extensive device operations and MPS export, and CPU-backed problem storage with host vectors and optional GPU conversion.
Solver implementations
cpp/src/pdlp/solve.cu, cpp/src/pdlp/solve_remote.cu, cpp/src/mip_heuristics/solve.cu
Adds interface-based solve_lp/solve_mip overloads supporting both local and remote execution paths, remote stubs for CPU/GPU problems, and solution conversion to Cython-compatible return types.
Optimization problem core
cpp/include/cuopt/linear_programming/optimization_problem.hpp, cpp/src/pdlp/optimization_problem.cu
Adds move-based setters for zero-copy data transfers into optimization_problem_t.
PDLP solver integration
cpp/include/cuopt/linear_programming/pdlp/pdlp_warm_start_data.hpp, pdlp/solver_settings.hpp, cpp/src/pdlp/pdlp.cu
Extends PDLP warm-start support with CPU-side storage, adds is_populated() predicate, and updates solver settings with CPU warm-start accessors.
Cython/Python bindings
python/cuopt/cuopt/linear_programming/solver/solver.pxd, solver_wrapper.pyx, cpp/include/cuopt/linear_programming/utilities/cython_types.hpp, cpp/src/pdlp/utilities/cython_solve.cu
Refactors Cython layer to use variant-based return types for CPU/GPU solutions, introduces helper types for CPU and GPU LP/MIP results, and updates solution creation to handle both backends with conditional data path selection.
C API refactor
cpp/src/pdlp/cuopt_c.cpp, cpp/src/pdlp/cuopt_c_internal.hpp
Converts C API from direct problem/solution object usage to interface-based approach with per-backend resource initialization, problem/solution interface pointers, and unified solve/result handling.
Python module structure
python/cuopt/cuopt/__init__.py
Implements lazy-loading mechanism for submodules (linear_programming, routing, distance_engine) using __getattr__ to support CPU-only remote configurations.
Python utilities
python/cuopt/cuopt/utilities/utils.py, python/cuopt/cuopt/linear_programming/solver/solver.pyx
Updates copyright year, moves tolerance imports to inline usage, and adds _vector_to_numpy helper for CPU-side vector conversion to numpy arrays with conditional memory-location-aware branching.
C API tests
cpp/tests/linear_programming/c_api_tests/c_api_tests.cpp, c_api_tests.h, c_api_test.c
Adds interface-based problem comparison, CPU-only test environment setup, and new tests for MIP/LP boundary behavior and CPU-only execution modes.
CUDA C++ unit tests
cpp/tests/linear_programming/unit_tests/solution_interface_test.cu
Comprehensive test coverage for solution interfaces, polymorphic method behavior, GPU↔CPU data conversions, MPS data model conversions, and warm-start data handling.
Python integration tests
python/cuopt/cuopt/tests/linear_programming/test_cpu_only_execution.py, test_lp_solver.py
Adds CPU-only execution tests via environment variable fixtures, solution polymorphism validation, and warm-start method configuration for PDLP solver.
Build system
cpp/src/pdlp/CMakeLists.txt, ci/run_ctests.sh, cpp/tests/linear_programming/CMakeLists.txt
Extends LP_CORE_FILES to include new backend implementations and solution conversion, updates test runner to execute CPU-memory-backend variant tests.
Public headers
cpp/include/cuopt/linear_programming/solve.hpp, cpp/include/cuopt/linear_programming/utilities/cython_solve.hpp, cpp/include/cuopt/linear_programming/solve_remote.hpp, cpp/include/cuopt/linear_programming/pdlp/restart_strategy/pdlp_restart_strategy.cu
Exports interface-based solve_lp/solve_mip overloads, remote solver stubs, variant-based Cython aggregate types, and updates iterator types.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.95% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately describes the main change: introducing memory model support for host-resident problem data, which aligns with the substantial refactoring across the codebase to support both GPU and CPU memory backends.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
cpp/include/cuopt/linear_programming/utilities/callbacks_implems.hpp (2)

41-51: ⚠️ Potential issue | 🟠 Major

Guard NULL PyObject returns before DECREF or use.

PyObject_CallMethod can return nullptr; the new Py_DECREF(res) will crash on failure, and numba_matrix / numpy_array can also be nullptr. Add NULL checks and use Py_XDECREF to avoid segfaults and ensure error propagation.

🔧 Proposed fix (safe NULL handling)
   void get_solution(void* data, void* objective_value) override
   {
-    PyObject* numba_matrix =
-      data_on_device() ? get_numba_matrix(data, n_variables) : get_numpy_array(data, n_variables);
-    PyObject* numpy_array =
-      data_on_device() ? get_numba_matrix(objective_value, 1) : get_numpy_array(objective_value, 1);
+    const bool on_device = data_on_device();
+    PyObject* numba_matrix =
+      on_device ? get_numba_matrix(data, n_variables) : get_numpy_array(data, n_variables);
+    PyObject* numpy_array =
+      on_device ? get_numba_matrix(objective_value, 1) : get_numpy_array(objective_value, 1);
+    if (!numba_matrix || !numpy_array) {
+      Py_XDECREF(numba_matrix);
+      Py_XDECREF(numpy_array);
+      PyErr_Print();
+      return;
+    }
     PyObject* res =
       PyObject_CallMethod(this->pyCallbackClass, "get_solution", "(OO)", numba_matrix, numpy_array);
-    Py_DECREF(numba_matrix);
-    Py_DECREF(numpy_array);
-    Py_DECREF(res);
+    if (!res) { PyErr_Print(); }
+    Py_XDECREF(res);
+    Py_XDECREF(numba_matrix);
+    Py_XDECREF(numpy_array);
   }

80-90: ⚠️ Potential issue | 🟠 Major

Add NULL checks for set_solution callback objects.

Same failure mode as get_solution: PyObject_CallMethod can return nullptr, and Py_DECREF on it will crash. Guard and use Py_XDECREF, plus early return if inputs fail to construct.

🔧 Proposed fix (safe NULL handling)
   void set_solution(void* data, void* objective_value) override
   {
-    PyObject* numba_matrix =
-      data_on_device() ? get_numba_matrix(data, n_variables) : get_numpy_array(data, n_variables);
-    PyObject* numpy_array =
-      data_on_device() ? get_numba_matrix(objective_value, 1) : get_numpy_array(objective_value, 1);
+    const bool on_device = data_on_device();
+    PyObject* numba_matrix =
+      on_device ? get_numba_matrix(data, n_variables) : get_numpy_array(data, n_variables);
+    PyObject* numpy_array =
+      on_device ? get_numba_matrix(objective_value, 1) : get_numpy_array(objective_value, 1);
+    if (!numba_matrix || !numpy_array) {
+      Py_XDECREF(numba_matrix);
+      Py_XDECREF(numpy_array);
+      PyErr_Print();
+      return;
+    }
     PyObject* res =
       PyObject_CallMethod(this->pyCallbackClass, "set_solution", "(OO)", numba_matrix, numpy_array);
-    Py_DECREF(numba_matrix);
-    Py_DECREF(numpy_array);
-    Py_DECREF(res);
+    if (!res) { PyErr_Print(); }
+    Py_XDECREF(res);
+    Py_XDECREF(numba_matrix);
+    Py_XDECREF(numpy_array);
   }
cpp/src/linear_programming/cuopt_c.cpp (5)

50-73: ⚠️ Potential issue | 🟠 Major

Guard view creation to keep the C API exception-safe.
create_view_from_mps_data_model() is outside the try/catch; if it throws, the C boundary is crossed and problem_and_stream leaks. Wrap view creation and use RAII to ensure cleanup.

🛠️ Suggested fix (exception-safe view creation)
-  problem_and_stream_view_t* problem_and_stream = new problem_and_stream_view_t();
+  auto problem_and_stream = std::make_unique<problem_and_stream_view_t>();
   std::string filename_str(filename);
   bool input_mps_strict = false;
   std::unique_ptr<mps_data_t> mps_data_model_ptr;
   try {
     mps_data_model_ptr = std::make_unique<mps_data_t>(
       parse_mps<cuopt_int_t, cuopt_float_t>(filename_str, input_mps_strict));
   } catch (const std::exception& e) {
     CUOPT_LOG_INFO("Error parsing MPS file: %s", e.what());
-    delete problem_and_stream;
     *problem_ptr = nullptr;
     if (std::string(e.what()).find("Error opening MPS file") != std::string::npos) {
       return CUOPT_MPS_FILE_ERROR;
     } else {
       return CUOPT_MPS_PARSE_ERROR;
     }
   }
-  problem_and_stream->op_problem = mps_data_model_ptr.release();
-  problem_and_stream->view       = create_view_from_mps_data_model(*problem_and_stream->op_problem);
-  problem_and_stream->view.set_is_device_memory(false);
-
-  *problem_ptr = static_cast<cuOptOptimizationProblem>(problem_and_stream);
+  try {
+    problem_and_stream->view = create_view_from_mps_data_model(*mps_data_model_ptr);
+    problem_and_stream->view.set_is_device_memory(false);
+    problem_and_stream->op_problem = mps_data_model_ptr.release();
+  } catch (const std::exception& e) {
+    CUOPT_LOG_INFO("Error creating view from MPS data: %s", e.what());
+    *problem_ptr = nullptr;
+    return CUOPT_INVALID_ARGUMENT;
+  }
+
+  *problem_ptr = static_cast<cuOptOptimizationProblem>(problem_and_stream.release());

737-744: ⚠️ Potential issue | 🟡 Minor

Add the same op_problem null check as other getters.
This keeps cuOptIsMIP aligned with the other accessors and avoids dereferencing invalid handles.

🔧 Suggested fix
   problem_and_stream_view_t* problem_and_stream_view =
     static_cast<problem_and_stream_view_t*>(problem);
+  if (!problem_and_stream_view->op_problem) { return CUOPT_INVALID_ARGUMENT; }
   *is_mip_ptr = static_cast<cuopt_int_t>(problem_and_stream_view->is_mip());

747-798: ⚠️ Potential issue | 🟠 Major

Make cuOptSolve exception-safe to avoid leaks on solve failure.
If solve_mip/solve_lp throws, the allocated solution wrapper leaks and the exception crosses the C boundary.

🛠️ Suggested fix (RAII + try/catch)
-    solution_and_stream_view_t* solution_and_stream_view =
-      new solution_and_stream_view_t(true, handle);
-    solution_and_stream_view->mip_solution_ptr = new mip_solution_t<cuopt_int_t, cuopt_float_t>(
-      solve_mip<cuopt_int_t, cuopt_float_t>(handle, view, mip_settings));
-    *solution_ptr = static_cast<cuOptSolution>(solution_and_stream_view);
+    auto solution_and_stream_view =
+      std::make_unique<solution_and_stream_view_t>(true, handle);
+    try {
+      solution_and_stream_view->mip_solution_ptr =
+        new mip_solution_t<cuopt_int_t, cuopt_float_t>(
+          solve_mip<cuopt_int_t, cuopt_float_t>(handle, view, mip_settings));
+    } catch (const std::exception& e) {
+      CUOPT_LOG_INFO("MIP solve failed: %s", e.what());
+      return CUOPT_INVALID_ARGUMENT;
+    }
+    *solution_ptr = static_cast<cuOptSolution>(solution_and_stream_view.release());
...
-    solution_and_stream_view_t* solution_and_stream_view =
-      new solution_and_stream_view_t(false, handle);
-    solution_and_stream_view->lp_solution_ptr =
-      new optimization_problem_solution_t<cuopt_int_t, cuopt_float_t>(
-        solve_lp<cuopt_int_t, cuopt_float_t>(handle, view, pdlp_settings));
-    *solution_ptr = static_cast<cuOptSolution>(solution_and_stream_view);
+    auto solution_and_stream_view =
+      std::make_unique<solution_and_stream_view_t>(false, handle);
+    try {
+      solution_and_stream_view->lp_solution_ptr =
+        new optimization_problem_solution_t<cuopt_int_t, cuopt_float_t>(
+          solve_lp<cuopt_int_t, cuopt_float_t>(handle, view, pdlp_settings));
+    } catch (const std::exception& e) {
+      CUOPT_LOG_INFO("LP solve failed: %s", e.what());
+      return CUOPT_INVALID_ARGUMENT;
+    }
+    *solution_ptr = static_cast<cuOptSolution>(solution_and_stream_view.release());

878-903: ⚠️ Potential issue | 🟠 Major

Handle host-resident solutions in cuOptGetPrimalSolution.
Remote solves can return host data; the current code always reads device vectors and will throw for host-only solutions.

🛠️ Suggested fix
   if (solution_and_stream_view->is_mip) {
     mip_solution_t<cuopt_int_t, cuopt_float_t>* mip_solution =
       static_cast<mip_solution_t<cuopt_int_t, cuopt_float_t>*>(
         solution_and_stream_view->mip_solution_ptr);
-    const rmm::device_uvector<cuopt_float_t>& solution_values = mip_solution->get_solution();
-    rmm::cuda_stream_view stream =
-      solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
-    raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
-    stream.synchronize();
+    if (mip_solution->is_device_memory()) {
+      const auto& solution_values = mip_solution->get_solution();
+      rmm::cuda_stream_view stream =
+        solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
+      raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
+      stream.synchronize();
+    } else {
+      const auto& solution_values = mip_solution->get_solution_host();
+      std::copy(solution_values.begin(), solution_values.end(), solution_values_ptr);
+    }
   } else {
     optimization_problem_solution_t<cuopt_int_t, cuopt_float_t>* optimization_problem_solution =
       static_cast<optimization_problem_solution_t<cuopt_int_t, cuopt_float_t>*>(
         solution_and_stream_view->lp_solution_ptr);
-    const rmm::device_uvector<cuopt_float_t>& solution_values =
-      optimization_problem_solution->get_primal_solution();
-    rmm::cuda_stream_view stream =
-      solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
-    raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
-    stream.synchronize();
+    if (optimization_problem_solution->is_device_memory()) {
+      const auto& solution_values = optimization_problem_solution->get_primal_solution();
+      rmm::cuda_stream_view stream =
+        solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
+      raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
+      stream.synchronize();
+    } else {
+      const auto& solution_values = optimization_problem_solution->get_primal_solution_host();
+      std::copy(solution_values.begin(), solution_values.end(), solution_values_ptr);
+    }
   }

981-999: ⚠️ Potential issue | 🟠 Major

Add host-path copies for dual solution and reduced costs.
These getters still assume device memory and will fail for remote/host solutions.

🛠️ Suggested fix
-    const rmm::device_uvector<cuopt_float_t>& dual_solution =
-      optimization_problem_solution->get_dual_solution();
-    rmm::cuda_stream_view stream =
-      solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
-    raft::copy(dual_solution_ptr, dual_solution.data(), dual_solution.size(), stream);
-    stream.synchronize();
+    if (optimization_problem_solution->is_device_memory()) {
+      const auto& dual_solution = optimization_problem_solution->get_dual_solution();
+      rmm::cuda_stream_view stream =
+        solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
+      raft::copy(dual_solution_ptr, dual_solution.data(), dual_solution.size(), stream);
+      stream.synchronize();
+    } else {
+      const auto& dual_solution = optimization_problem_solution->get_dual_solution_host();
+      std::copy(dual_solution.begin(), dual_solution.end(), dual_solution_ptr);
+    }
...
-    const rmm::device_uvector<cuopt_float_t>& reduced_cost =
-      optimization_problem_solution->get_reduced_cost();
-    rmm::cuda_stream_view stream =
-      solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
-    raft::copy(reduced_cost_ptr, reduced_cost.data(), reduced_cost.size(), stream);
-    stream.synchronize();
+    if (optimization_problem_solution->is_device_memory()) {
+      const auto& reduced_cost = optimization_problem_solution->get_reduced_cost();
+      rmm::cuda_stream_view stream =
+        solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
+      raft::copy(reduced_cost_ptr, reduced_cost.data(), reduced_cost.size(), stream);
+      stream.synchronize();
+    } else {
+      const auto& reduced_cost = optimization_problem_solution->get_reduced_cost_host();
+      std::copy(reduced_cost.begin(), reduced_cost.end(), reduced_cost_ptr);
+    }

Also applies to: 1021-1039

cpp/src/linear_programming/solver_solution.cu (1)

191-265: ⚠️ Potential issue | 🟠 Major

Release device buffers when copy_from switches to host data.
When other.is_device_memory_ is false, the existing device buffers remain allocated, which can retain large GPU memory unnecessarily.

🛠️ Suggested fix
   } else {
     // Copy CPU data
+    // Release device buffers when switching to host-backed data
+    primal_solution_.reset();
+    dual_solution_.reset();
+    reduced_cost_.reset();
     if (!primal_solution_host_) { primal_solution_host_ = std::make_unique<std::vector<f_t>>(); }
     if (!dual_solution_host_) { dual_solution_host_ = std::make_unique<std::vector<f_t>>(); }
     if (!reduced_cost_host_) { reduced_cost_host_ = std::make_unique<std::vector<f_t>>(); }
As per coding guidelines: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events.
🤖 Fix all issues with AI agents
In `@cpp/libmps_parser/src/data_model_view.cpp`:
- Around line 351-392: The code in data_model_view_t::get_problem_category()
(and other span-returning getters using variable_types_.data() / spans)
dereferences host spans without checking the is_device_memory_ flag; either
guard these accesses by checking is_device_memory_ and performing a device->host
copy into a host-local span before reading, or remove the unused flag and
document that spans must be host-resident; specifically update
data_model_view_t::get_problem_category(), all span-returning getters, and any
callers that assume host memory (e.g., remove reliance on
set_is_device_memory(false) or add an explicit device-to-host copy helper
invoked when is_device_memory_ is true) so accesses to variable_types_,
column_types_, etc. never read device memory directly.

In `@cpp/src/linear_programming/solve.cu`:
- Around line 1299-1306: In solve_lp, remove the unreachable duplicate
remote_config.has_value() branch (the block that logs CUOPT_REMOTE_HOST/PORT and
calls solve_lp_remote) since remote_config was already checked earlier and
returned; specifically delete the second if (remote_config.has_value()) {
CUOPT_LOG_INFO(...); return solve_lp_remote(*remote_config, view, settings); }
block so the function flow no longer contains dead code referencing
remote_config, solve_lp_remote, view, and settings.

In `@cpp/src/mip/solve.cu`:
- Around line 338-342: The null handle_ptr check must be moved so it executes
before either data path; ensure you validate handle_ptr (the pointer used by
data_model_view_to_optimization_problem) before calling
data_model_view_to_optimization_problem so both the device and CPU branches are
protected; specifically, place the existing handle_ptr == nullptr check above
the device memory branch (i.e., before any calls into
data_model_view_to_optimization_problem/optimization_problem_t constructor),
keep the same error logging and return of mip_solution_t with
cuopt::logic_error, and avoid duplicating checks—one early guard covers both
paths.

In `@cpp/tests/linear_programming/c_api_tests/c_api_tests.cpp`:
- Around line 177-184: Replace the exact equality check on the floating-point
objective offset with an epsilon comparison: instead of comparing
orig->get_objective_offset() == rerd->get_objective_offset(), compute the
absolute difference and compare it to a small tolerance (e.g.,
std::numeric_limits<double>::epsilon() * scale or a fixed small value) so the
result uses fabs(orig->get_objective_offset() - rerd->get_objective_offset()) <=
tol; keep the integer checks for orig->get_n_constraints(),
orig->get_n_variables(), orig->get_nnz() unchanged and use these symbol names to
locate the change.

In `@python/cuopt/cuopt/__init__.py`:
- Around line 29-33: The __dir__() function currently returns __all__ +
_submodules which duplicates submodule names because __all__ already unpacks
_submodules; change __dir__ to return a deduplicated list (preferably simply
return __all__) or otherwise return an order-preserving deduplicated sequence
(e.g., using dict.fromkeys) so names from __all__ and _submodules aren’t
repeated; update the __dir__ definition accordingly (referencing __dir__,
__all__, and _submodules).

In `@python/cuopt/cuopt/tests/linear_programming/test_memory_model.py`:
- Around line 167-168: The test sets variable types using an integer array which
prevents is_mip() from recognizing MIP variables; update the test to provide
string-encoded type codes (e.g., use a one-byte string array) when calling
data_model_obj.set_variable_types so the solver's is_mip() (solver.py) will
detect the "I" code; replace the np.array([1]) usage with a string bytes array
like np.array(["I"], dtype="S1") (or equivalent single-byte string encoding) so
the test exercises the MIP path.
🧹 Nitpick comments (14)
cpp/tests/linear_programming/unit_tests/lp_solution_memory_test.cu (2)

22-39: Clarify test naming and incomplete guard implementation.

The test name host_only_accessors_need_coderabbit_patch and the commented-out EXPECT_THROW assertions suggest that device-accessor guards for host-only solutions are not yet implemented. The test comments reference a "coderabbit_changes.patch" which is an unusual convention.

Consider:

  1. Renaming tests to reflect current behavior (e.g., host_solution_basic_accessors)
  2. Creating a tracking issue for implementing the accessor guards
  3. Using // TODO(issue#123): Enable when guards are implemented instead of referencing a patch file

70-104: Consider adding stream synchronization before accessor validation.

The test creates device vectors without explicit initialization values. While this is acceptable for testing accessor availability, you may want to add handle.sync_stream() before calling EXPECT_NO_THROW on device accessors to ensure any pending operations are complete.

                                                         &handle,
                                                         true);

+  handle.sync_stream();
   EXPECT_TRUE(solution.is_device_memory());
python/cuopt/cuopt/tests/linear_programming/test_memory_model.py (1)

187-250: Add infeasible/unbounded + free‑variable cases to memory‑model tests.
To align the new memory‑model suite with test requirements, consider adding a minimal infeasible LP, an unbounded LP, and a free‑variable case (host‑memory path).

As per coding guidelines: “Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)” and “Test with free variables, singleton problems, and extreme problem dimensions near resource limits to validate edge case handling.”

cpp/tests/mip/mip_solution_memory_test.cu (1)

25-47: Remove patch-specific naming and enable guard behavior validation in tests.

The guards are implemented: EXE_CUOPT_EXPECTS checks on get_solution() and get_solution_host() verify pointer validity. The test names ("need_coderabbit_patch") and commented-out EXPECT_THROW assertions are misleading and should be removed or the assertions enabled. Either rename the tests to reflect their current behavior or uncomment the EXPECT_THROW for get_solution() on host-only solutions to validate that guards properly throw logic_error.

Suggested fix
-TEST(mip_solution_memory, host_only_accessors_need_coderabbit_patch)
+TEST(mip_solution_memory, host_only_accessors)
 {
-  // This test validates that EXE_CUOPT_EXPECTS guards are in place
-  // Guards are added in coderabbit_changes.patch
+  // Verify host-only solution behavior
   std::vector<double> solution{0.0};
   std::vector<std::string> var_names{"x0"};
   solver_stats_t<int, double> stats{};

   mip_solution_t<int, double> mip_solution(std::move(solution),
                                            std::move(var_names),
                                            0.0,
                                            0.0,
                                            mip_termination_status_t::Optimal,
                                            0.0,
                                            0.0,
                                            0.0,
                                            stats);

   EXPECT_FALSE(mip_solution.is_device_memory());
-  // After applying CodeRabbit patch, this should throw
-  // EXPECT_THROW(mip_solution.get_solution(), cuopt::logic_error);
+  EXPECT_THROW(mip_solution.get_solution(), cuopt::logic_error);
   EXPECT_NO_THROW(mip_solution.get_solution_host());
 }
cpp/tests/linear_programming/unit_tests/memory_model_infrastructure_test.cu (1)

287-309: Test uses host pointers with is_device_memory=true flag — intentional but potentially confusing.

The comment on line 292 notes this is "simulated with host pointers for test", but passing host pointers while marking is_device_memory(true) could cause undefined behavior if the conversion function actually attempts GPU memory operations. This test only validates that EXPECT_NO_THROW succeeds, but doesn't verify the resulting optimization problem's correctness.

Consider adding a comment clarifying that this test relies on the conversion function not dereferencing the pointers during view-to-optimization_problem conversion, or using actual device memory allocations for a more realistic test.

cpp/libmps_parser/CMakeLists.txt (2)

98-100: Static library missing compile options.

The shared library mps_parser has target_compile_options applied (line 98-100), but mps_parser_static does not receive the same treatment. This could lead to inconsistent warning/error behavior between the two builds.

♻️ Proposed fix to add compile options to static library
 target_compile_options(mps_parser
   PRIVATE "$<$<COMPILE_LANGUAGE:CXX>:${MPS_PARSER_CXX_FLAGS}>"
 )
+
+target_compile_options(mps_parser_static
+  PRIVATE "$<$<COMPILE_LANGUAGE:CXX>:${MPS_PARSER_CXX_FLAGS}>"
+)

Also applies to: 128-135


137-143: Static library missing BZip2/ZLIB include directories.

If MPS_PARSER_WITH_BZIP2 or MPS_PARSER_WITH_ZLIB are enabled, only the shared library receives the corresponding include directories. The static library should receive them as well.

♻️ Proposed fix to add compression library includes to static target
 if(MPS_PARSER_WITH_BZIP2)
     target_include_directories(mps_parser PRIVATE BZip2::BZip2)
+    target_include_directories(mps_parser_static PRIVATE BZip2::BZip2)
 endif(MPS_PARSER_WITH_BZIP2)

 if(MPS_PARSER_WITH_ZLIB)
     target_include_directories(mps_parser PRIVATE ZLIB::ZLIB)
+    target_include_directories(mps_parser_static PRIVATE ZLIB::ZLIB)
 endif(MPS_PARSER_WITH_ZLIB)
cpp/src/mip/solve.cu (1)

331-345: Consider consolidating duplicate conversion calls.

Lines 333 and 343 both call data_model_view_to_optimization_problem(handle_ptr, view). The only difference is the null-check guard for CPU data. Consider restructuring to avoid duplication:

♻️ Proposed refactor to consolidate conversion logic
-  if (view.is_device_memory()) {
-    // Local solve: data already on GPU - convert view to optimization_problem_t and solve
-    auto op_problem = data_model_view_to_optimization_problem(handle_ptr, view);
-    return solve_mip(op_problem, settings);
-  }
-
   // Local solve with CPU data: copy to GPU and solve
-  if (handle_ptr == nullptr) {
+  if (!view.is_device_memory() && handle_ptr == nullptr) {
     CUOPT_LOG_ERROR("[solve_mip] Local solve requested but handle_ptr is null.");
     return mip_solution_t<i_t, f_t>(
       cuopt::logic_error("No CUDA handle for CPU->GPU copy", cuopt::error_type_t::RuntimeError));
   }
+
   auto op_problem = data_model_view_to_optimization_problem(handle_ptr, view);
   return solve_mip(op_problem, settings);
 }
cpp/libmps_parser/include/mps_parser/data_model_view.hpp (1)

450-454: Document that get_problem_category() reflects the variable types at first call.

The lazy-cached problem_category_ uses std::call_once for thread-safe initialization, reading variable_types_ to compute the category. While the documented API indicates set_variable_types() is called before solving (during initialization), explicitly documenting that get_problem_category() reflects the state at first call would clarify the intended usage pattern and prevent potential misuse where variable types are modified after category computation.

cpp/include/cuopt/linear_programming/mip/solver_solution.hpp (1)

111-116: Document precondition for host accessors.

The get_solution_host() methods document "Only valid when is_device_memory() returns false", but there's no runtime enforcement. Consider adding a debug assertion to catch misuse early:

cuopt_assert(!is_device_memory_, "get_solution_host() called on device-memory solution");

This would help catch programming errors during development.

cpp/src/linear_programming/utilities/cython_solve.cu (1)

208-214: Consider a more explicit warm-start presence check.

The warm-start detection relies on checking if last_restart_duality_gap_dual_solution_.data() != nullptr. While functional, a dedicated has_warm_start_data() method would be clearer and less brittle.

// Current: relies on internal pointer state
if (!is_mip && solver_settings->get_pdlp_warm_start_data_view()
                   .last_restart_duality_gap_dual_solution_.data() != nullptr) {

This is minor since the current approach works, but consider adding a helper method for clarity in future iterations.

cpp/src/linear_programming/optimization_problem_conversions.cu (3)

61-76: Clarify cudaPointerGetAttributes error handling.

The error handling for cudaPointerGetAttributes is functionally correct but could be clearer. The pattern handles:

  1. Device memory: copy D2H
  2. Host memory: direct copy
  3. Unregistered memory (returns cudaErrorInvalidValue): treated as host memory

Consider adding a brief comment explaining why cudaErrorInvalidValue is acceptable:

     } else {
       // Source is on host (or unregistered) - direct copy
       if (err != cudaSuccess) { cudaGetLastError(); }  // Clear cudaPointerGetAttributes error
+      // cudaErrorInvalidValue is expected for unregistered host memory - treat as host
       if (err != cudaSuccess && err != cudaErrorInvalidValue) { RAFT_CUDA_TRY(err); }
       std::memcpy(host_var_types.data(), var_types.data(), var_types.size() * sizeof(char));
     }

108-154: Consider extracting memory-aware copy helper.

The cudaPointerGetAttributes + conditional copy pattern is duplicated between variable types (lines 61-76) and quadratic objective (lines 114-146). Consider extracting a helper template:

template <typename T>
void copy_to_host(std::vector<T>& dest, const T* src, size_t count);

This would reduce duplication and centralize the error handling logic. However, this is a nice-to-have refactor that can be deferred.


241-335: Good non-owning view creation with correct memory semantics.

The function correctly creates a non-owning view pointing to mps_data_model's data and sets is_device_memory(false).

Important lifetime consideration: The returned view holds raw pointers to the mps_data_model's internal storage. The view is only valid as long as the mps_data_model remains alive and unmodified. This is documented implicitly by "non-owning view" but consider adding explicit documentation:

/**
 * `@brief` Create a non-owning view from an MPS data model.
 * `@warning` The returned view holds pointers to mps_data_model's data.
 *          The view is invalid if mps_data_model is destroyed or modified.
 */

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
cpp/include/cuopt/linear_programming/utilities/callbacks_implems.hpp (1)

41-52: ⚠️ Potential issue | 🟡 Minor

Add null checks before Py_DECREF to prevent crashes on Python errors.

PyObject_CallMethod can return NULL if the Python call fails (e.g., method not found, exception raised). Calling Py_DECREF(NULL) causes undefined behavior. The same issue exists in set_solution.

🛡️ Proposed fix
   void get_solution(void* data, void* objective_value) override
   {
     PyObject* numba_matrix =
       data_on_device() ? get_numba_matrix(data, n_variables) : get_numpy_array(data, n_variables);
     PyObject* numpy_array =
       data_on_device() ? get_numba_matrix(objective_value, 1) : get_numpy_array(objective_value, 1);
+    if (numba_matrix == nullptr || numpy_array == nullptr) {
+      Py_XDECREF(numba_matrix);
+      Py_XDECREF(numpy_array);
+      return;
+    }
     PyObject* res =
       PyObject_CallMethod(this->pyCallbackClass, "get_solution", "(OO)", numba_matrix, numpy_array);
     Py_DECREF(numba_matrix);
     Py_DECREF(numpy_array);
-    Py_DECREF(res);
+    Py_XDECREF(res);
   }
cpp/libmps_parser/CMakeLists.txt (1)

70-136: ⚠️ Potential issue | 🟠 Major

mps_parser_static misses C++ standard and compile options.

The shared target sets CXX_STANDARD and MPS_PARSER_CXX_FLAGS, but the static target does not. This can break static builds if the sources rely on C++20 features or warning flags. Consider applying the same properties (and optional include/link settings) to mps_parser_static.

🔧 Proposed fix (keep targets consistent)
 add_library(mps_parser SHARED ${MPS_PARSER_SOURCES})
 
 # Static library for linking into libcuopt
 add_library(mps_parser_static STATIC ${MPS_PARSER_SOURCES})
 
 set_target_properties(mps_parser
@@
   CXX_SCAN_FOR_MODULES OFF
 )
+
+set_target_properties(mps_parser_static
+  PROPERTIES
+  CXX_STANDARD 20
+  CXX_STANDARD_REQUIRED ON
+  INTERFACE_POSITION_INDEPENDENT_CODE ON
+  CXX_SCAN_FOR_MODULES OFF
+)
@@
 target_compile_options(mps_parser
   PRIVATE "$<$<COMPILE_LANGUAGE:CXX>:${MPS_PARSER_CXX_FLAGS}>"
 )
+
+target_compile_options(mps_parser_static
+  PRIVATE "$<$<COMPILE_LANGUAGE:CXX>:${MPS_PARSER_CXX_FLAGS}>"
+)
cpp/src/linear_programming/cuopt_c.cpp (1)

878-903: ⚠️ Potential issue | 🟠 Major

Handle host‑memory solutions in C API getters.

Remote solves can return host-only solutions, but the getters always read device buffers. This will throw or return invalid data when is_device_memory() is false. Add a host branch (for MIP and LP) and use the new host accessors.

🐛 Proposed fix (example for cuOptGetPrimalSolution)
   if (solution_and_stream_view->is_mip) {
     mip_solution_t<cuopt_int_t, cuopt_float_t>* mip_solution =
       static_cast<mip_solution_t<cuopt_int_t, cuopt_float_t>*>(
         solution_and_stream_view->mip_solution_ptr);
-    const rmm::device_uvector<cuopt_float_t>& solution_values = mip_solution->get_solution();
-    rmm::cuda_stream_view stream =
-      solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
-    raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
-    stream.synchronize();
+    if (mip_solution->is_device_memory()) {
+      const rmm::device_uvector<cuopt_float_t>& solution_values = mip_solution->get_solution();
+      rmm::cuda_stream_view stream =
+        solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
+      raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
+      stream.synchronize();
+    } else {
+      const auto& host_solution = mip_solution->get_solution_host();
+      std::copy(host_solution.begin(), host_solution.end(), solution_values_ptr);
+    }
   } else {
     optimization_problem_solution_t<cuopt_int_t, cuopt_float_t>* optimization_problem_solution =
       static_cast<optimization_problem_solution_t<cuopt_int_t, cuopt_float_t>*>(
         solution_and_stream_view->lp_solution_ptr);
-    const rmm::device_uvector<cuopt_float_t>& solution_values =
-      optimization_problem_solution->get_primal_solution();
-    rmm::cuda_stream_view stream =
-      solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
-    raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
-    stream.synchronize();
+    if (optimization_problem_solution->is_device_memory()) {
+      const rmm::device_uvector<cuopt_float_t>& solution_values =
+        optimization_problem_solution->get_primal_solution();
+      rmm::cuda_stream_view stream =
+        solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
+      raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
+      stream.synchronize();
+    } else {
+      const auto& host_solution = optimization_problem_solution->get_primal_solution_host();
+      std::copy(host_solution.begin(), host_solution.end(), solution_values_ptr);
+    }
   }

Please mirror the same device/host branching in cuOptGetDualSolution and cuOptGetReducedCosts.

Also applies to: 981-999, 1021-1039

cpp/src/linear_programming/solver_solution.cu (1)

191-265: ⚠️ Potential issue | 🟠 Major

Clear stale buffers when switching memory domains in copy_from.
When copying from a different memory domain, the old buffers remain allocated and can be returned by getters, leading to stale data and wasted memory. Reset the inactive domain in each branch to preserve invariants.

🧹 Proposed fix to clear inactive-domain buffers
   if (other.is_device_memory_) {
     // Copy GPU data
@@
     if (other.reduced_cost_) {
       reduced_cost_->resize(other.reduced_cost_->size(), handle_ptr->get_stream());
       raft::copy(reduced_cost_->data(),
                  other.reduced_cost_->data(),
                  reduced_cost_->size(),
                  handle_ptr->get_stream());
     } else {
       reduced_cost_->resize(0, handle_ptr->get_stream());
     }
 
+    // Clear stale host buffers when switching to device memory
+    primal_solution_host_.reset();
+    dual_solution_host_.reset();
+    reduced_cost_host_.reset();
+
     handle_ptr->sync_stream();
   } else {
     // Copy CPU data
@@
     if (other.reduced_cost_host_) {
       *reduced_cost_host_ = *other.reduced_cost_host_;
     } else {
       reduced_cost_host_->clear();
     }
+
+    // Clear stale device buffers when switching to host memory
+    primal_solution_.reset();
+    dual_solution_.reset();
+    reduced_cost_.reset();
   }
🤖 Fix all issues with AI agents
In `@cpp/include/cuopt/linear_programming/mip/solver_solution.hpp`:
- Around line 157-169: The to_host(rmm::cuda_stream_view) implementation must
perform an asynchronous device-to-host copy on the provided stream and check
CUDA errors without synchronizing; update mip_solution_t<...>::to_host to call
rmm::device_uvector::copy_async (or cudaMemcpyAsync) into the host buffer when
is_device_memory() is true, pass stream_view.stream() for the stream, wrap the
copy call with CUDA_CHECK (or the project’s CUDA_CHECK macro) to validate the
operation, and remove any cudaDeviceSynchronize or stream synchronization calls
so the path remains async; ensure after scheduling the copy you update internal
state so get_solution_host() refers to the host buffer once the copy is
enqueued.

In `@cpp/include/cuopt/linear_programming/utilities/remote_solve.hpp`:
- Around line 71-110: Add a precondition check in the remote stub(s) to enforce
that input data is host-resident: in solve_lp_remote(...) (and the other remote
overload) assert or throw if the provided
cuopt::mps_parser::data_model_view_t<i_t,f_t>& view is not on the host (use the
view's host-residency accessor, e.g., view.is_host_resident() or
view.is_on_host(), or equivalent), so the stub fails fast instead of silently
proceeding; place this check at the top of the functions (before computing
n_rows/n_cols) and keep behavior consistent (assert for debug builds or throw
std::invalid_argument/std::runtime_error for public API).

In `@cpp/tests/linear_programming/unit_tests/memory_model_infrastructure_test.cu`:
- Around line 36-65: The tests mutate process-global CUOPT_REMOTE_HOST/PORT in
RemoteSolveConfigTest::SetUp/ TearDown which can race when tests run in
parallel; fix by introducing a global test-scope mutex (e.g., a static
std::mutex or a test helper ScopedEnvLock) and acquire it for the entire
duration of each test instance in SetUp and release in TearDown (or use RAII
ScopedEnvLock constructed in SetUp and destroyed in TearDown); ensure the lock
covers both the unsetenv/setenv calls and any GPU/global-state touches so
environment and global state are serialized across tests.
🧹 Nitpick comments (5)
cpp/tests/mip/mip_solution_memory_test.cu (1)

101-117: Consider adding edge case tests for empty solutions.

The termination-status-only and error constructors are well tested. For more comprehensive coverage per coding guidelines, consider adding tests for:

  • Empty variable names vector
  • Zero-variable solutions
  • Degenerate termination statuses

These could help validate edge case handling in the solution memory paths.

cpp/tests/linear_programming/unit_tests/memory_model_infrastructure_test.cu (2)

218-331: Exercise a real device‑pointer path (or rename the test).

gpu_view_conversion marks device memory but uses host pointers, so it doesn’t validate the real GPU copy path. Consider allocating device buffers (e.g., rmm::device_uvector) when GPU tests are available, or rename to clarify it’s a “GPU‑flag only” test.

Based on learnings: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems).


336-387: Strengthen stub assertions with value checks (zeros).

These tests assert sizes and status only. Add value checks for the returned vectors (all zeros) to confirm numerical content, not just shape.

Based on learnings: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems).

✅ Example enhancement
   EXPECT_FALSE(solution.is_device_memory());
   EXPECT_EQ(solution.get_termination_status(), pdlp_termination_status_t::Optimal);
   EXPECT_EQ(solution.get_primal_solution_host().size(), 1);
+  EXPECT_EQ(solution.get_primal_solution_host(), std::vector<double>({0.0}));
   EXPECT_NO_THROW(solution.get_primal_solution_host());
   EXPECT_NO_THROW(solution.get_dual_solution_host());
cpp/include/cuopt/linear_programming/utilities/remote_solve.hpp (1)

40-56: Consider exception‑free port parsing with std::from_chars.

This avoids exceptions in headers and is faster/locale‑independent.

♻️ Suggested change
+#include <charconv>
 ...
   if (host != nullptr && port != nullptr && host[0] != '\0' && port[0] != '\0') {
-    try {
-      int port_num = std::stoi(port);
-      if (port_num < 1 || port_num > 65535) { return std::nullopt; }
-      return remote_solve_config_t{std::string(host), port_num};
-    } catch (...) {
-      // Invalid port number, fall back to local solve
-      return std::nullopt;
-    }
+    int port_num = 0;
+    auto [ptr, ec] = std::from_chars(port, port + std::strlen(port), port_num);
+    if (ec != std::errc{} || ptr == port || *ptr != '\0') { return std::nullopt; }
+    if (port_num < 1 || port_num > 65535) { return std::nullopt; }
+    return remote_solve_config_t{std::string(host), port_num};
   }
cpp/src/linear_programming/solver_solution.cu (1)

733-782: Consider consolidating the repeated host-copy logic.
The device→host copy blocks are repeated in write_to_file, write_to_sol_file, and to_host. Extracting a shared helper would reduce duplication and future maintenance drift.

Based on learnings: Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication.

@tmckayus tmckayus force-pushed the memory-model-clean branch 2 times, most recently from 97dc9a3 to 44f16ea Compare February 1, 2026 19:59
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Fix all issues with AI agents
In `@cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp`:
- Around line 66-107: The const getters get_primal_solution_host,
get_dual_solution_host, and get_reduced_cost_host currently mutate shared
optional caches without synchronization; protect initialization of
primal_solution_host_cache_, dual_solution_host_cache_, and
reduced_cost_host_cache_ by adding thread-safety (e.g., introduce mutable
std::mutex members or per-cache std::once_flag/mutexes and lock/guard them
inside each getter before testing/initializing the cache) so only one thread
performs the device->host copy and other threads safely read the cached
std::vector; ensure the mutex/once_flag members are mutable to allow use in
const methods and apply the same pattern to all three getters.

In
`@cpp/include/cuopt/linear_programming/optimization_problem_solution_interface.hpp`:
- Around line 57-62: The base class optimization_problem_solution_interface_t
declares virtual double get_solve_time() which conflicts with
lp_solution_interface_t::get_solve_time() returning f_t; remove the base-class
get_solve_time() declaration from optimization_problem_solution_interface_t (or
alternatively make it return f_t if f_t is a common type available) so the
derived lp_solution_interface_t can provide the single consistent accessor
get_solve_time() with return type f_t and avoid hiding/override mismatch.

In `@cpp/include/cuopt/linear_programming/optimization_problem_utils.hpp`:
- Around line 84-94: char_variable_types may have a different length than
n_vars, so calling problem->set_variable_types(enum_variable_types.data(),
n_vars) can read past enum_variable_types; ensure you use the actual length of
enum_variable_types (e.g., enum_variable_types.size() or
char_variable_types.size()) when calling set_variable_types. Update the
conversion block that builds enum_variable_types (from
data_model.get_variable_types() and var_t) to compute a size_t count =
enum_variable_types.size() and pass that count to problem->set_variable_types
instead of n_vars, and optionally validate/early-return if sizes mismatch.

In `@cpp/src/linear_programming/cpu_optimization_problem.cpp`:
- Around line 763-768: The CSR matrix comparison in is_equivalent() only checks
A_.size() against other_A_values.size() which can miss differing nonzero values;
update is_equivalent() to perform a full CSR value comparison (respecting
row/column permutations used elsewhere) or, if keeping the simplified check for
now, make the comment explicit and add a TODO and an assertion guard for
intended CI-only use: modify is_equivalent() to either implement value-wise
comparison of A_ against other_A_values using the same permutations applied to
row_ptr/col_idx, or add a clear TODO and runtime assertion/guard that this
function is only used for MPS roundtrip tests; reference symbols:
is_equivalent(), A_, other_A_values, and the permutation arrays used earlier in
the function.

In `@cpp/src/linear_programming/cuopt_c_internal.hpp`:
- Around line 75-94: The struct solution_and_stream_view_t currently owns raw
pointers (mip_solution_interface_ptr, lp_solution_interface_ptr) and defines a
destructor but not the other special members; to avoid double-free, delete the
copy constructor and copy assignment, and implement move constructor and move
assignment that transfer ownership of mip_solution_interface_ptr,
lp_solution_interface_ptr, and backend_type while nulling the source pointers
and preserving is_mip; keep the destructor to delete any remaining owned
pointers. Update symbols: solution_and_stream_view_t,
mip_solution_interface_ptr, lp_solution_interface_ptr, is_mip, backend_type, and
the existing ~solution_and_stream_view_t to ensure safe move semantics and
disabled copying.
- Around line 23-73: The struct problem_and_stream_view_t currently owns raw
pointers (gpu_problem, cpu_problem, stream_view_ptr, handle_ptr) but lacks
copy/move special members, risking double-free on copy; delete the copy
constructor and copy assignment operator and implement a move constructor and
move assignment that transfer ownership by moving gpu_problem, cpu_problem,
stream_view_ptr, and handle_ptr from the source to *this and set the source
pointers to nullptr, ensuring the destructor (already null-safe) will only free
moved-in resources; keep get_handle_ptr, get_problem, and
to_optimization_problem behavior intact after the move.

In `@cpp/src/linear_programming/gpu_optimization_problem.cu`:
- Around line 756-846: The is_equivalent implementation must (1) fallback to
direct-order comparison when variable/row names are missing instead of returning
false, and (2) actually compare CSR matrix data instead of only sizes. Update
gpu_optimization_problem_t<i_t,f_t>::is_equivalent to: if get_variable_names()
or get_row_names() are empty, skip building var_perm/row_perm and compare
objective coefficients, bounds, types and full CSR arrays in direct index order;
otherwise build var_perm and row_perm as now and use them to permute columns and
rows when comparing the CSR components returned by
get_constraint_matrix_values_host(), get_constraint_matrix_col_indices_host(),
and get_constraint_matrix_row_offsets_host() (compare values with fabs tolerance
and exact match for indices/offsets/types), and ensure you also check
sizes/offsets consistently to avoid out-of-bounds when permuting.
- Around line 67-83: In
gpu_optimization_problem_t<i_t,f_t>::set_csr_constraint_matrix validate that
size_offsets > 0 before computing n_constraints_ and before resizing/copying A_,
A_indices_, and A_offsets_; if size_offsets == 0, return/throw a clear error
(e.g., throw std::invalid_argument or assert) to avoid underflow of
n_constraints_ = size_offsets - 1 and prevent invalid CSR state, otherwise
proceed to set n_constraints_, resize A_, A_indices_, A_offsets_, and perform
raft::copy as currently implemented.
- Around line 114-130: The method set_quadratic_objective_matrix currently
copies Q_values/Q_indices/Q_offsets into host std::vector members (Q_values_,
Q_indices_, Q_offsets_) using std::copy, which conflicts with the class doc
claiming setters accept device pointers; either update the docs or add
device-aware copying: detect device pointers (or accept a flag), and if inputs
are device memory use raft::copy into the host vectors or change members to
rmm::device_uvector and use raft::copy to copy from host to device as
appropriate; update set_quadratic_objective_matrix implementation to mirror the
device-aware behavior used by set_variable_lower_bounds (use
raft::copy/rmm::device_uvector) or change the interface doc to state it accepts
host pointers only so behavior matches Q_values_/Q_indices_/Q_offsets storage.

In `@cpp/src/linear_programming/optimization_problem.cu`:
- Around line 620-684: Several move-setters leave metadata stale: ensure setters
that change sizes also update n_vars_ / n_constraints_ and that
set_variable_types_move also recomputes problem_category_. Specifically, keep
the existing n_constraints_ update in set_csr_constraint_matrix_move, add
n_constraints_ updates to any other constraint-buffer setters if they change
A_offsets_, set n_vars_ = variable_types_.size() (or at least ensure n_vars_
matches c_.size()/variable_types_.size()) inside set_variable_types_move, and
after assigning variable_types_ recompute problem_category_ using the same
logic/path used elsewhere when variable types are set (i.e., mirror the
classification code used when building the problem from non-move APIs so
LP/MIP/IP category and bounds checks remain consistent). Ensure each move-based
setter that affects counts mirrors the non-move setter behaviors (use the same
symbols: n_vars_, n_constraints_, variable_types_, problem_category_).

In `@cpp/src/linear_programming/solve_remote.cu`:
- Around line 151-169: The GPU->CPU conversion is missing copying of row types
and constraint bounds; call the GPU getters (e.g.,
gpu_problem.get_row_types_host() and gpu_problem.get_constraint_bounds_host())
and, if non-empty, pass their data and sizes into the corresponding CPU setters
(e.g., cpu_problem.set_row_types(..., ...) and
cpu_problem.set_constraint_bounds(..., ...)); place these after the existing
constraint bounds copy so the CPU problem receives the row_types (constraint
sense 'E'/'L'/'G') and the RHS b-vector for non-ranged constraints.

In `@cpp/src/linear_programming/utilities/cython_solve.cu`:
- Around line 222-224: The code currently calls lp_solution_ptr.release() before
invoking std::move(*gpu_lp_sol).to_linear_programming_ret_t(), which leaks
memory if to_linear_programming_ret_t() throws; instead, keep ownership until
the conversion succeeds: do not call lp_solution_ptr.release() prior to
conversion, call std::move(*gpu_lp_sol).to_linear_programming_ret_t() while
lp_solution_ptr still owns the memory (access via lp_solution_ptr.get() or by
dereferencing), assign response.lp_ret from that result, and only call
lp_solution_ptr.release() (or std::unique_ptr reset/move) after the conversion
completes successfully; apply the same change for the other occurrences around
to_linear_programming_ret_t() at the mentioned locations.

In `@python/cuopt/cuopt/linear_programming/solver/solver.pxd`:
- Around line 128-130: Remove the redundant C++ redeclaration of device_buffer:
delete the entire cdef extern from "<rmm/device_buffer.hpp>" namespace "rmm":
block that defines cppclass device_buffer and rely on the existing cimport of
device_buffer (from rmm.librmm.device_buffer on line 16) so there is a single
consistent declaration across .pxd files and no symbol conflicts.
🧹 Nitpick comments (8)
cpp/include/cuopt/linear_programming/pdlp/pdlp_warm_start_data.hpp (1)

73-74: LGTM - Consider aligning style with CPU version.

The is_populated() sentinel check is correct. For consistency with cpu_pdlp_warm_start_data.hpp (line 98) which uses !...empty(), consider:

-  bool is_populated() const { return last_restart_duality_gap_dual_solution_.size() > 0; }
+  bool is_populated() const { return !last_restart_duality_gap_dual_solution_.empty(); }

Both are functionally equivalent; !empty() is the more idiomatic C++ style for container non-emptiness checks.

cpp/src/linear_programming/cpu_pdlp_warm_start_data.cu (1)

40-73: Consider batching stream synchronization for better performance.

Each device_to_host_vector call synchronizes the stream independently. For the 9 vector fields being copied, this results in 9 separate synchronizations. Batching the copies and performing a single sync at the end would reduce synchronization overhead.

♻️ Suggested approach

Create a variant that doesn't synchronize internally, then synchronize once after all copies:

 // Helper to copy device_uvector to std::vector (D2H)
 template <typename T>
 std::vector<T> device_to_host_vector(const rmm::device_uvector<T>& device_vec,
-                                     rmm::cuda_stream_view stream)
+                                     rmm::cuda_stream_view stream,
+                                     bool sync = true)
 {
   if (device_vec.size() == 0) return std::vector<T>();
 
   std::vector<T> host_vec(device_vec.size());
   raft::copy(host_vec.data(), device_vec.data(), device_vec.size(), stream);
-  stream.synchronize();
+  if (sync) stream.synchronize();
   return host_vec;
 }

Then in convert_to_cpu_warmstart, pass false for sync on all calls except the last, or call stream.synchronize() once at the end.

cpp/src/linear_programming/cuopt_c.cpp (1)

413-427: Repeated backend branching pattern could benefit from abstraction.

The pattern of checking backend_type == problem_backend_t::CPU and then either using host getters or copying from device appears in many getter functions (cuOptGetObjectiveCoefficients, cuOptGetConstraintMatrix, cuOptGetConstraintSense, etc.).

Consider extracting this into a helper template or macro to reduce duplication. However, given this is the C API layer and clarity is important, the current explicit approach is acceptable.

cpp/src/linear_programming/cuopt_c_internal.hpp (1)

40-46: Destructor deletion order is correct but fragile.

The deletion order (problem → handle → stream) is appropriate since problems may reference handles. However, this implicit ordering dependency is not documented and could break if refactored.

Consider adding a brief comment documenting the deletion order rationale, or using std::unique_ptr with custom deleters to make ownership explicit.

cpp/include/cuopt/linear_programming/cpu_pdlp_warm_start_data.hpp (1)

50-95: Consider helper to reduce repetitive null-check pattern.

The constructor has 9 nearly identical blocks checking data() != nullptr before copying. While functional, this could be simplified with a helper lambda or template function.

♻️ Optional refactor using helper lambda
   cpu_pdlp_warm_start_data_t(const pdlp_warm_start_data_view_t<i_t, f_t>& view)
     : initial_primal_weight_(view.initial_primal_weight_),
       // ... scalar initializers ...
   {
+    auto copy_span = [](auto& dest, const auto& src) {
+      if (src.data() != nullptr) {
+        dest.assign(src.data(), src.data() + src.size());
+      }
+    };
+    copy_span(current_primal_solution_, view.current_primal_solution_);
+    copy_span(current_dual_solution_, view.current_dual_solution_);
+    // ... etc for all 9 vectors ...
-    // Copy vector data from spans
-    if (view.current_primal_solution_.data() != nullptr) {
-      current_primal_solution_.assign(
-        view.current_primal_solution_.data(),
-        view.current_primal_solution_.data() + view.current_primal_solution_.size());
-    }
-    // ... repeated 8 more times ...
   }

Based on learnings: "Refactor code duplication in solver components (3+ occurrences) into shared utilities."

cpp/src/linear_programming/cpu_optimization_problem.cpp (1)

26-31: Use logging instead of fprintf for consistency.

The constructor uses fprintf(stderr, ...) while the rest of the codebase uses CUOPT_LOG_* macros. This could cause inconsistent output formatting.

♻️ Proposed fix
 template <typename i_t, typename f_t>
 cpu_optimization_problem_t<i_t, f_t>::cpu_optimization_problem_t(raft::handle_t const* handle_ptr)
   : handle_ptr_(handle_ptr)
 {
-  fprintf(stderr, "cpu_optimization_problem_t constructor: Using CPU backend\n");
+  CUOPT_LOG_INFO("cpu_optimization_problem_t constructor: Using CPU backend");
 }
cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp (1)

159-283: Extract a shared device‑to‑host copy helper for warm‑start accessors.

The warm‑start getters repeat the same size‑check → allocate → copy → sync pattern. A small helper would reduce duplication and keep copy/sync behavior consistent.

Based on learnings: Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication.

cpp/include/cuopt/linear_programming/optimization_problem_interface.hpp (1)

430-435: Add override on to_optimization_problem() in concrete classes.

This helps the compiler catch signature drift in the interface implementations.

♻️ Proposed diff
-  optimization_problem_t<i_t, f_t> to_optimization_problem();
+  optimization_problem_t<i_t, f_t> to_optimization_problem() override;
...
-  optimization_problem_t<i_t, f_t> to_optimization_problem();
+  optimization_problem_t<i_t, f_t> to_optimization_problem() override;

Also applies to: 586-592

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@cpp/src/linear_programming/cpu_optimization_problem.cpp`:
- Around line 638-644: The code only writes variable_upper_bounds_ when
variable_lower_bounds_ is non-empty; change the guards so each bound array is
checked independently: call data_model_view.set_variable_lower_bounds(...)
inside an if (!variable_lower_bounds_.empty()) block and call
data_model_view.set_variable_upper_bounds(...) inside a separate if
(!variable_upper_bounds_.empty()) block, referencing the existing symbols
variable_lower_bounds_, variable_upper_bounds_, and
data_model_view.set_variable_lower_bounds /
data_model_view.set_variable_upper_bounds to locate and update the logic.
- Around line 90-107: The parameter validate_positive_semi_definite on
cpu_optimization_problem_t<i_t,f_t>::set_quadratic_objective_matrix is currently
ignored; either implement PSD validation or explicitly mark it unused and
document the decision — to be minimal, add a single-line use
(void)validate_positive_semi_definite; plus a short comment inside
set_quadratic_objective_matrix noting the CPU backend does not perform PSD
validation (or alternatively implement a PSD check by converting
Q_values/Q_indices/Q_offsets to a dense symmetric matrix and verifying
non-negative eigenvalues via a helper like
validate_matrix_positive_semi_definite and call it from
set_quadratic_objective_matrix).
- Line 30: Remove the debug stderr print from the cpu_optimization_problem_t
constructor (the fprintf(stderr, ...) call) and either delete it or replace it
with the project's logging macro/function from <utilities/logger.hpp> (use the
appropriate debug/info logger such as LOG_DEBUG or equivalent) so construction
no longer writes to stderr; locate the call inside the
cpu_optimization_problem_t constructor in cpu_optimization_problem.cpp and
update it accordingly.
- Around line 43-60: The CPU setter
cpu_optimization_problem_t::set_csr_constraint_matrix currently dereferences
A_values/A_indices/A_offsets without null checks; add cuopt_expects-style
validations so that when size_values>0 you assert A_values != nullptr, when
size_indices>0 assert A_indices != nullptr, and when size_offsets>0 assert
A_offsets != nullptr (mirror the GPU optimization_problem_t checks), then
proceed to resize/copy; apply the same pattern to set_constraint_bounds,
set_objective_coefficients, and set_quadratic_objective_matrix for their
pointer/size pairs; finally address the unused validate_positive_semi_definite
parameter on set_quadratic_objective_matrix by either removing it or
implementing the intended validation branch (choose one and make the
signature/implementation consistent).
🧹 Nitpick comments (1)
cpp/src/linear_programming/cpu_optimization_problem.cpp (1)

749-761: Consider using relative tolerance for large-magnitude values.

The hardcoded 1e-9 absolute tolerance works for typical problem values but may cause false negatives for very large coefficients (e.g., comparing 1e12 vs 1e12 + 1e-9 would incorrectly pass, while 1e12 vs 1e12 * (1 + 1e-15) might incorrectly fail). For MPS roundtrip testing this is likely acceptable, but a combined relative/absolute tolerance would be more robust.

@tmckayus tmckayus requested a review from Kh4ster February 1, 2026 21:02
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp`:
- Around line 333-337: The to_gpu_solution(rmm::cuda_stream_view) implementation
currently moves solution_ but is callable on lvalues, enabling a use-after-move;
change its signature to be rvalue-qualified (add &&) so it can only be invoked
on temporaries, and update the base class method declaration in
lp_solution_interface_t to match (rvalue-qualified) so the override remains
valid; ensure the return type stays optimization_problem_solution_t<i_t,f_t> and
that any callers are adjusted to call on rvalues or use std::move if they
deliberately transfer ownership.

In `@cpp/include/cuopt/linear_programming/optimization_problem_utils.hpp`:
- Around line 189-203: When target is CPU but has_gpu_warmstart_view is true and
has_gpu_warmstart_data is false, the code must first populate the GPU warmstart
buffers from the Python view (the same routine used in the GPU backend branch
that fills solver_settings->get_pdlp_settings().get_pdlp_warm_start_data())
before calling convert_to_cpu_warmstart; modify the CPU-target branch (around
the convert_to_cpu_warmstart call) to detect has_gpu_warmstart_view &&
!has_gpu_warmstart_data and invoke the same view-to-warmstart population logic
that the GPU branch uses, then call convert_to_cpu_warmstart(...) and assign to
get_cpu_pdlp_warm_start_data().
🧹 Nitpick comments (1)
cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp (1)

59-64: const_cast usage indicates underlying API const-correctness issue.

The get_reduced_cost_size() method casts away const to call get_reduced_cost(). This pattern repeats throughout the class (lines 99-100, 146-150, 154-156, 162-163, etc.). While functional, this suggests optimization_problem_solution_t::get_reduced_cost() and get_pdlp_warm_start_data() should be const-qualified in the underlying class.

Consider marking get_reduced_cost() and get_pdlp_warm_start_data() as const in optimization_problem_solution_t to eliminate the need for const_cast throughout this wrapper.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/linear_programming/cuopt_c.cpp (1)

48-71: ⚠️ Potential issue | 🟠 Major

Ensure exception safety in cuOptReadProblem.

populate_from_mps_data_model(...) is outside the try/catch; if it throws, the exception crosses the C ABI and problem_and_stream leaks. Wrap the populate call and clean up on failure.

🛡️ Suggested fix
-  // Populate interface directly from MPS data model (avoids temporary GPU allocation)
-  populate_from_mps_data_model(problem_and_stream->get_problem(), *mps_data_model_ptr);
+  try {
+    // Populate interface directly from MPS data model (avoids temporary GPU allocation)
+    populate_from_mps_data_model(problem_and_stream->get_problem(), *mps_data_model_ptr);
+  } catch (const std::exception& e) {
+    CUOPT_LOG_INFO("Error populating problem data: %s", e.what());
+    delete problem_and_stream;
+    *problem_ptr = nullptr;
+    return CUOPT_INVALID_ARGUMENT;
+  }

As per coding guidelines “Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths.”

🤖 Fix all issues with AI agents
In `@cpp/src/linear_programming/cpu_optimization_problem.cpp`:
- Around line 52-56: The code computes n_constraints_ = size_offsets - 1 without
guarding for size_offsets == 0, which underflows and leaves the CSR arrays (A_,
A_indices_, A_offsets_) in an invalid state; update the beginning of the block
around n_constraints_ to validate size_offsets and either return/throw on zero
(or set n_constraints_ to 0 and clear the CSR vectors) before resizing A_,
A_indices_, and A_offsets_, ensuring any downstream logic sees a consistent
empty CSR state; reference n_constraints_, size_offsets, A_, A_indices_, and
A_offsets_ when applying the guard.

In `@cpp/src/linear_programming/gpu_optimization_problem.cu`:
- Around line 711-716: The current block only writes both bounds when
variable_lower_bounds is non-empty, causing valid upper-only bounds to be
dropped; change the logic so you call
data_model_view.set_variable_lower_bounds(...) only if variable_lower_bounds is
non-empty and call data_model_view.set_variable_upper_bounds(...) independently
if variable_upper_bounds is non-empty (i.e., check variable_upper_bounds.empty()
separately and invoke set_variable_upper_bounds when present), leaving the two
calls decoupled.

In `@cpp/src/linear_programming/solve.cu`:
- Around line 1338-1366: The remote-vs-local routing currently only checks
is_remote_execution_enabled(), ignoring the configured
backend/CUOPT_USE_GPU_MEM; update the branch in solve_lp to first respect
get_backend_type()/CUOPT_USE_GPU_MEM (or check backend_type_t::GPU) so that if
the backend is GPU we force local solve (call solve_lp<i_t,f_t> path) and only
route to problem_interface->solve_lp_remote(settings) when remote is enabled AND
the backend is CPU; make the analogous change in solve_mip to ensure backend
selection overrides remote routing (use symbols is_remote_execution_enabled(),
get_backend_type(), CUOPT_USE_GPU_MEM, solve_lp, solve_mip, and
problem_interface->solve_lp_remote to locate code).

In `@cpp/src/linear_programming/utilities/cython_solve.cu`:
- Around line 249-263: call_solve currently mutates the shared warm-start object
returned by solver_settings->get_pdlp_settings().get_pdlp_warm_start_data(),
causing data races when call_batch_solve runs call_solve in parallel; fix by
ensuring each thread uses its own warm-start instance or by serializing the
stream-reset: either (A) clone the warm-start data at the start of call_solve
(create a local copy of warmstart_data and call set_stream on that copy) so
per-thread state is mutated, or (B) wrap the set_stream sequence (the block
touching current_primal_solution_, current_dual_solution_,
initial_primal_average_, initial_dual_average_, current_ATY_,
sum_primal_solutions_, sum_dual_solutions_,
last_restart_duality_gap_primal_solution_,
last_restart_duality_gap_dual_solution_) in a mutex held by
call_batch_solve/call_solve to prevent concurrent mutations; choose the
per-thread copy approach if you want concurrent solves, and ensure any copies
preserve device memory semantics (deep copy or move) when implementing.

Copy link
Collaborator

@rgsl888prabhu rgsl888prabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmckayus Changes looks good on system side, have few questions. But other than that looks awesome.

@anandhkb anandhkb added this to the 26.02 milestone Feb 2, 2026
Enable testing CPU problem/solution code paths without remote execution.
When CUOPT_USE_CPU_MEM_FOR_LOCAL=1 is set, local solves use CPU memory
structures: problems are built on CPU, converted to GPU for solving,
then solutions are converted back to CPU.

Changes:
- Add memory_backend_t enum and use_cpu_memory_for_local() function
- Add GPU↔CPU solution conversion methods (to_cpu_solution)
- Add solution_interface_test.cu for polymorphic interface testing
- Add test_cpu_only_execution.py for Python API CPU-only tests
- Add CI step to re-run C_API_TEST with CPU memory mode
- Extend C API tests for CPU memory backend coverage
This copy will be removed in a subsequent refactor that
eliminates optimization_problem_t so there will be no
extra copy ulitmately.
@tmckayus tmckayus force-pushed the memory-model-clean branch 2 times, most recently from 0e33171 to 54c9c09 Compare February 18, 2026 13:02
@tmckayus
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (7)
cpp/src/pdlp/gpu_optimization_problem.cu (2)

114-131: Quadratic objective stored on host — std::copy assumes host pointers.

This is consistent with how the C API populates Q data and matches the past review resolution. The validate_positive_semi_definite parameter is accepted but unused — consider documenting this or adding a TODO if validation is planned.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/gpu_optimization_problem.cu` around lines 114 - 131, In
gpu_optimization_problem_t<i_t,f_t>::set_quadratic_objective_matrix the function
presently assumes Q_values/Q_indices/Q_offsets are host pointers and copies them
into Q_values_, Q_indices_, Q_offsets_ using std::copy, but the
validate_positive_semi_definite parameter is accepted and unused; update this
function by either adding a clear inline comment stating that the inputs are
host memory and therefore copied with std::copy (or documenting this behavior in
the header), or add a TODO/NOT_IMPLEMENTED comment referencing
validate_positive_semi_definite to indicate planned PSD validation (or implement
validation) so the unused parameter is explicit; reference the method name and
the members Q_values_, Q_indices_, Q_offsets_ when making the change.

957-1026: C API copy-to-host helpers use proper CUDA error checking.

All cudaMemcpy calls wrapped in RAFT_CUDA_TRY. Note that these functions trust the caller-provided size parameter — out-of-bounds reads are possible if the caller passes a size larger than the actual device buffer. This is the standard C API contract (caller queries size first), but consider adding cuopt_expects guards for defense-in-depth.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/gpu_optimization_problem.cu` around lines 957 - 1026, Add
defensive precondition checks using cuopt_expects before each cudaMemcpy to
ensure the caller-provided sizes do not exceed the device buffers: in
copy_objective_coefficients_to_host check size <= c_.size(); in
copy_constraint_matrix_to_host check num_values <= A_.size(), num_indices <=
A_indices_.size(), num_offsets <= A_offsets_.size(); in copy_row_types_to_host
check size <= row_types_.size(); in copy_constraint_bounds_to_host check size <=
b_.size(); in copy_constraint_lower_bounds_to_host and
copy_constraint_upper_bounds_to_host check size <=
constraint_lower_bounds_.size() / constraint_upper_bounds_.size() respectively;
in copy_variable_lower_bounds_to_host and copy_variable_upper_bounds_to_host
check size <= variable_lower_bounds_.size() / variable_upper_bounds_.size(); and
in copy_variable_types_to_host check size <= variable_types_.size(); after each
cuopt_expects guard keep the existing RAFT_CUDA_TRY(cudaMemcpy(...)) calls.
cpp/include/cuopt/linear_programming/optimization_problem_solution_interface.hpp (1)

138-352: LP solution interface is comprehensive but has a large surface area.

The ~30 pure virtual methods (including all warm-start data accessors at Lines 315-331) create a wide vtable. This is acceptable given the two-backend (GPU/CPU) design, but may be worth consolidating the warm-start accessors behind a single get_warm_start_data_host() struct return in a future refactor if the interface stabilizes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cpp/include/cuopt/linear_programming/optimization_problem_solution_interface.hpp`
around lines 138 - 352, The interface lp_solution_interface_t currently exposes
many individual warm-start accessors (get_current_primal_solution_host,
get_current_dual_solution_host, get_initial_primal_average_host,
get_initial_dual_average_host, get_current_ATY_host,
get_sum_primal_solutions_host, get_sum_dual_solutions_host,
get_last_restart_duality_gap_primal_solution_host,
get_last_restart_duality_gap_dual_solution_host, get_initial_primal_weight,
get_initial_step_size, get_total_pdlp_iterations, get_total_pdhg_iterations,
get_last_candidate_kkt_score, get_last_restart_kkt_score,
get_sum_solution_weight, get_iterations_since_last_restart) which inflates the
vtable; consolidate these by introducing a single POD struct (e.g.,
pdlp_warm_start_host_t) containing all warm-start vectors/scalars and replace
the group of virtual methods with one virtual: virtual pdlp_warm_start_host_t
get_warm_start_data_host() const = 0; update implementations of
lp_solution_interface_t and any derived GPU/CPU solution classes to populate and
return this struct, and remove the now-redundant individual getters to reduce
surface area and simplify future refactors.
cpp/src/pdlp/utilities/cython_solve.cu (1)

204-233: compute_max_thread assumes GPU availability.

cudaMemGetInfo at Line 211 will fail if called in CPU-only mode. Currently batch solving is GPU-only, so this isn't exploitable. If batch solving is extended to support CPU/remote backends in a follow-up, this will need a guard.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/utilities/cython_solve.cu` around lines 204 - 233,
compute_max_thread currently unconditionally calls cudaMemGetInfo (via
RAFT_CUDA_TRY) which will fail in CPU-only environments; wrap the GPU-specific
logic with a CUDA-availability guard (e.g., call cudaGetDeviceCount or use an
existing RAFT/compile-time macro) and skip cudaMemGetInfo when no device is
present. If no GPU is available, compute a safe fallback for res (for example,
std::min(max_total, (int)data_models.size()) or a value based solely on
data_models without querying GPU memory) and return that; otherwise run the
existing memory-based calculation. Ensure you only call cudaMemGetInfo inside
the guarded branch in compute_max_thread and preserve the cuopt_expects check
for the GPU path.
cpp/src/pdlp/cuopt_c.cpp (1)

858-916: Consider std::unique_ptr for solution_and_stream_view_t to make the ownership transfer exception-safe.

While the current code is unlikely to throw between new solution_and_stream_view_t(...) (line 879) and the assignment to *solution_ptr (line 882), using std::unique_ptr for intermediate ownership would make the intent clearer and protect against future refactoring that might introduce a throw in that window.

Sketch
-      solution_and_stream_view_t* solution_and_stream_view =
-        new solution_and_stream_view_t(true, problem_and_stream_view->memory_backend);
-      solution_and_stream_view->mip_solution_interface_ptr = solution_interface.release();
-      *solution_ptr = static_cast<cuOptSolution>(solution_and_stream_view);
+      auto solution_and_stream_view =
+        std::make_unique<solution_and_stream_view_t>(true, problem_and_stream_view->memory_backend);
+      solution_and_stream_view->mip_solution_interface_ptr = solution_interface.release();
+      *solution_ptr = static_cast<cuOptSolution>(solution_and_stream_view.release());

(Same pattern for the LP branch.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cuopt_c.cpp` around lines 858 - 916, Wrap the heap allocation of
solution_and_stream_view_t in a std::unique_ptr to make ownership transfer
exception-safe: create std::unique_ptr<solution_and_stream_view_t> tmp(new
solution_and_stream_view_t(...)) in both the MIP branch (where
solution_and_stream_view is used and mip_solution_interface_ptr is set) and the
LP branch (where lp_solution_interface_ptr is set), move any setup (assigning
mip_solution_interface_ptr/lp_solution_interface_ptr and calling printTimestamp)
while owned by tmp, then release tmp into *solution_ptr (i.e. *solution_ptr =
static_cast<cuOptSolution>(tmp.release())); this preserves existing behavior but
ensures no leak if an exception is introduced between new and assignment.
cpp/include/cuopt/linear_programming/utilities/cython_solve.hpp (1)

41-50: Document ownership semantics for raw-pointer return values.

call_solve_lp and call_solve_mip return raw pointers to lp_solution_interface_t / mip_solution_interface_t. The caller needs to know whether they own the returned pointer (and must delete it) or if it's borrowed. Since these are Cython-facing wrappers, consider adding a brief comment about who is responsible for lifetime management.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/cuopt/linear_programming/utilities/cython_solve.hpp` around lines
41 - 50, Document the ownership semantics for the raw pointers returned by
call_solve_lp and call_solve_mip: add a brief comment in cython_solve.hpp
immediately above the declarations of call_solve_lp and call_solve_mip stating
whether the returned lp_solution_interface_t<int,double>* and
mip_solution_interface_t<int,double>* are owned by the caller (caller must
delete/free) or are borrowed (caller must not delete), and if ownership is
transferred include the preferred deletion method (delete vs custom deleter) and
any lifetime constraints related to problem_interface or solver_settings;
reference the exact symbols call_solve_lp, call_solve_mip,
lp_solution_interface_t, and mip_solution_interface_t in the comment so users
and Cython wrappers know how to manage the returned pointer lifetime.
cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp (1)

60-330: Pervasive const_cast to access get_pdlp_warm_start_data() — suggests missing const overload.

get_pdlp_warm_start_data() on optimization_problem_solution_t lacks a const overload, forcing 15+ occurrences of const_cast<optimization_problem_solution_t<i_t, f_t>&>(solution_).get_pdlp_warm_start_data() throughout this class. Adding a const overload to optimization_problem_solution_t::get_pdlp_warm_start_data() would eliminate all these casts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp`
around lines 60 - 330, The code repeatedly uses const_cast on solution_ to call
get_pdlp_warm_start_data(), indicating
optimization_problem_solution_t<i_t,f_t>::get_pdlp_warm_start_data() is missing
a const overload; add a const-qualified overload returning const
pdlp_warm_start_data_t<i_t,f_t>& (and keep the non-const one), then update
callers (e.g., methods get_pdlp_warm_start_data(), has_warm_start_data(),
get_current_primal_solution_host(), get_current_dual_solution_host(),
get_initial_primal_average_host(), get_initial_dual_average_host(),
get_current_ATY_host(), get_sum_primal_solutions_host(),
get_sum_dual_solutions_host(),
get_last_restart_duality_gap_primal_solution_host(),
get_last_restart_duality_gap_dual_solution_host(), get_initial_primal_weight(),
get_initial_step_size(), get_total_pdlp_iterations(),
get_total_pdhg_iterations(), get_last_candidate_kkt_score(),
get_last_restart_kkt_score(), get_sum_solution_weight(),
get_iterations_since_last_restart()) to call
solution_.get_pdlp_warm_start_data() directly without const_cast.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/cuopt_cli.cpp`:
- Around line 356-372: Wrap the raw cudaSetDevice calls with RAFT_CUDA_TRY to
add CUDA error checking: replace cudaSetDevice(i) in the GPU initialization loop
and the final cudaSetDevice(0) with RAFT_CUDA_TRY(cudaSetDevice(i)) and
RAFT_CUDA_TRY(cudaSetDevice(0)) respectively so failures propagate via
RAFT_CUDA_TRY; locate these in the GPU branch guarded by memory_backend ==
cuopt::linear_programming::memory_backend_t::GPU and update the cudaSetDevice
usages in that block (the loop and the final reset).

In `@cpp/include/cuopt/linear_programming/cpu_optimization_problem_solution.hpp`:
- Around line 39-46: The error-path constructor
cpu_lp_solution_t(pdlp_termination_status_t, cuopt::logic_error) currently
leaves primitive members l2_primal_residual_, l2_dual_residual_, gap_,
num_iterations_, and solved_by_pdlp_ uninitialized; update that constructor's
initializer list to set l2_primal_residual_ and l2_dual_residual_ to a sensible
default (e.g., 0.0 or NaN consistent with primal/dual objective handling), set
gap_ to a default numeric value (e.g., NaN or 0.0), set num_iterations_ to 0,
and set solved_by_pdlp_ to false so that callers of get_l2_primal_residual(),
get_gap(), and get_num_iterations() see well-defined values for error solutions.

In `@cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp`:
- Around line 344-365: to_cpu_solution() currently omits PDLP warm-start data so
a GPU solution with has_warm_start_data() == true becomes a CPU solution without
warm-starts; fix by detecting has_warm_start_data() inside to_cpu_solution(),
copy the warm-start arrays from the GPU object using the existing warm-start
getters (e.g., the GPU-side get_* warm-start accessors) and populate the
cpu_lp_solution_t before returning it—either by calling the cpu_lp_solution_t
constructor overload that accepts warm-start fields or by invoking the
cpu_lp_solution_t set_warm_start_* methods on the created object so the returned
solution preserves the PDLP warm-start state.
- Around line 423-436: The getter gpu_mip_solution_t::get_solution_host() can
race on the mutable solution_host_cache_; fix it like gpu_lp_solution_t by
adding a private std::once_flag (e.g., solution_host_init_flag_) and replacing
the unguarded if-check with std::call_once on that flag; inside the call_once
lambda perform the host vector allocation, raft::copy from
solution_.get_solution().data(), and stream.synchronize() so initialization
happens exactly once and is thread-safe. Ensure the new flag is declared in the
class and used only to guard initialization in get_solution_host().

In `@cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp`:
- Around line 184-186: The new public CPU warm-start accessors lack Doxygen; add
Doxygen-style comments above both declarations get_cpu_pdlp_warm_start_data()
const noexcept and its non-const overload cpu_pdlp_warm_start_data_t<i_t, f_t>&
get_cpu_pdlp_warm_start_data() noexcept describing their purpose (access CPU
PDLp warm-start data for remote execution), the return type, noexcept behavior,
and any preconditions/ownership semantics; apply the same documentation pattern
to the analogous functions noted around the 261-266 region so all public header
APIs have Doxygen comments consistent with the project's guidelines.

In `@cpp/src/pdlp/cpu_optimization_problem.cpp`:
- Around line 599-606: The code in to_optimization_problem() skips calling
problem.set_csr_constraint_matrix(...) when A_ is empty which breaks cases of
valid CSR with zero non-zeros but existing row offsets; update the guard so the
CSR constraint matrix is set whenever there is row-structure information (i.e.,
A_offsets_ present) even if A_ has zero values. Concretely, replace the if
(!A_.empty()) check with a condition like if (!A_offsets_.empty()) (or if
(!A_.empty() || !A_offsets_.empty())) and still call
problem.set_csr_constraint_matrix(A_.data(), A_.size(), A_indices_.data(),
A_indices_.size(), A_offsets_.data(), A_offsets_.size()) so n_constraints_ on
the GPU reflects A_offsets_.
- Around line 700-705: The current guard uses && so constraint bounds are only
written when both constraint_lower_bounds_ and constraint_upper_bounds_ are
non-empty; change to write each bound independently: call
data_model_view.set_constraint_lower_bounds(...) when constraint_lower_bounds_
is non-empty and separately call
data_model_view.set_constraint_upper_bounds(...) when constraint_upper_bounds_
is non-empty. Update the block around the calls to
data_model_view.set_constraint_lower_bounds and
data_model_view.set_constraint_upper_bounds in cpu_optimization_problem.cpp to
mirror the independent handling done in to_optimization_problem().
- Around line 774-798: The is_equivalent() path indexes variable_lower_bounds_,
variable_upper_bounds_, variable_types_, and b_ without ensuring those vectors
actually contain n_vars_ / n_constraints_ elements, causing UB when they are
empty; before the loops that compare per-variable and per-constraint values (the
loop using c_, variable_lower_bounds_, variable_upper_bounds_, variable_types_,
and the loop using b_), add explicit size checks that
variable_lower_bounds_.size() == n_vars_, variable_upper_bounds_.size() ==
n_vars_, variable_types_.size() == n_vars_ and b_.size() == n_constraints_ (and
similarly verify other_var_lb.size(), other_var_ub.size(),
other_var_types.size(), other_b.size()); if any of these size checks fail return
false. Apply the identical guards to the permutation-path comparisons as well
(the code that uses permuted indices and compares other_var_* and other_b) so no
vector is indexed unless its size matches the expected count.

In `@cpp/src/pdlp/cuopt_c_internal.hpp`:
- Around line 24-38: The constructor problem_and_stream_view_t currently
allocates stream_view_ptr, handle_ptr, and gpu_problem with raw new, which can
leak if a subsequent new throws; update the GPU branch to use RAII (e.g., local
std::unique_ptr wrappers for rmm::cuda_stream_view, raft::handle_t, and
gpu_optimization_problem_t) so partially-constructed resources are automatically
freed on exceptions, then move/release them to the class members
(stream_view_ptr, handle_ptr, gpu_problem) only after all allocations succeed;
alternatively, wrap the allocations in a try/catch that deletes any
already-allocated members on failure — apply the same pattern for cpu_problem if
similar risks exist.

In `@cpp/src/pdlp/gpu_optimization_problem.cu`:
- Around line 733-739: The current code only calls
data_model_view.set_constraint_lower_bounds and set_constraint_upper_bounds when
both constraint_lower_bounds and constraint_upper_bounds are non-empty; change
this to mirror the independent checks used for variable bounds: call
set_constraint_lower_bounds if constraint_lower_bounds is non-empty (regardless
of the upper bounds), and call set_constraint_upper_bounds if
constraint_upper_bounds is non-empty (regardless of the lower bounds), using
constraint_lower_bounds.data()/size() and constraint_upper_bounds.data()/size()
respectively so a problem with only one side of constraint bounds is still
exported.

In `@cpp/tests/linear_programming/unit_tests/solution_interface_test.cu`:
- Around line 455-527: The tests gpu_warmstart_to_cpu and cpu_warmstart_to_gpu
currently silently pass when warmstart data is absent because they only guard
with if (solution->has_warm_start_data()) / if
(cpu_solution->has_warm_start_data()); change these to assert presence
explicitly (e.g., ASSERT_TRUE(solution->has_warm_start_data()) and
ASSERT_TRUE(cpu_solution->has_warm_start_data()) or use GTEST_SKIP with a clear
message) before calling get_current_primal_solution_host(),
get_current_dual_solution_host(), to_gpu_solution(), and accessing
get_pdlp_warm_start_data(); this ensures missing warmstart data fails or is
skipped with an explicit reason rather than letting the test pass silently.

---

Duplicate comments:
In `@cpp/include/cuopt/linear_programming/cpu_optimization_problem_solution.hpp`:
- Around line 355-378: The to_gpu_solution() path is only setting
primal_objective, dual_objective and solve_time in
additional_termination_information_t and drops l2_primal_residual_,
l2_dual_residual_, gap_, number_of_steps_taken and solved_by_pdlp_ from
cpu_lp_solution_t; update the termination_stats[0] assignment in
to_gpu_solution() (before constructing optimization_problem_solution_t) to
copy/convert those missing fields into termination_stats[0] (e.g., set
l2_primal_residual, l2_dual_residual, gap, number_of_steps_taken,
solved_by_pdlp), taking care to match the additional_termination_information_t
member names and types so round‑trips via optimization_problem_solution_t
preserve all convergence diagnostics.

In `@cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp`:
- Around line 333-337: The method to_gpu_solution currently moves solution_
while being callable on lvalues which causes use-after-move; change the API so
moving is only allowed on rvalues: make to_gpu_solution rvalue-qualified (e.g.,
add && to the method declaration/definition) and update the corresponding
virtual declaration in the base class optimization_problem_solution_t, then
apply the same rvalue-qualification to both gpu_lp_solution_t::to_gpu_solution
and gpu_mip_solution_t::to_gpu_solution; alternatively provide a non-moving
lvalue overload that returns a copy or throws to prevent accidental moves from
lvalues.

In `@cpp/include/cuopt/linear_programming/optimization_problem_utils.hpp`:
- Around line 85-96: The code builds enum_variable_types from
data_model.get_variable_types() but calls problem->set_variable_types(...,
n_vars), risking out-of-bounds if n_vars differs; change the call to pass the
actual length of enum_variable_types (or the minimum of n_vars and
enum_variable_types.size()) so set_variable_types only reads the buffer that was
allocated — locate the conversion loop and the call to
problem->set_variable_types and replace the second argument from n_vars to the
correct size derived from enum_variable_types (e.g., enum_variable_types.size()
cast to the expected integer type or min(n_vars, enum_variable_types.size())).
- Around line 159-207: When target_is_gpu is false but only a GPU warmstart view
is present (has_gpu_warmstart_view), materialize the GPU view into device
buffers before converting D2H: create a pdlp_warm_start_data_t<i_t,f_t> from
solver_settings->get_pdlp_warm_start_data_view() using a valid CUDA stream
(rmm::cuda_stream_per_thread), set it into
solver_settings->get_pdlp_settings().set_pdlp_warm_start_data(...), and then
call convert_to_cpu_warmstart(...) on get_pdlp_warm_start_data() to populate
get_cpu_pdlp_warm_start_data(); i.e., add a branch under the CPU-target path
that handles the has_gpu_warmstart_view case by materializing via
pdlp_warm_start_data_t before D2H conversion.

In `@cpp/src/pdlp/optimization_problem.cu`:
- Around line 616-696: The move-based setters are not updating size metadata;
modify set_constraint_bounds_move, set_variable_lower_bounds_move,
set_variable_upper_bounds_move, set_constraint_lower_bounds_move,
set_constraint_upper_bounds_move, and set_row_types_move to set n_constraints_
or n_vars_ consistently like their non-move counterparts: after assigning the
moved rmm::device_uvector to the member (e.g., b_ = std::move(b);
variable_lower_bounds_ = std::move(...); row_types_ = std::move(...)), compute
the new size (use .size()) and assign n_constraints_ = size for
constraint-related setters and n_vars_ = size for variable-related setters so
get_n_variables()/get_n_constraints() remain correct; keep the existing behavior
in set_objective_coefficients_move and set_csr_constraint_matrix_move as
examples.

In `@cpp/src/pdlp/solution_conversion.cu`:
- Around line 96-108: Guard the unchecked access to
solution_.get_additional_termination_information(0): before calling that getter,
check that the solution contains at least one additional termination information
entry and only then assign term_info and copy its fields into ret; otherwise
populate ret.l2_primal_residual_, ret.l2_dual_residual_, ret.primal_objective_,
ret.dual_objective_, ret.gap_, ret.nb_iterations_, ret.solve_time_, and
ret.solved_by_pdlp_ with sensible defaults (e.g., zeros or NaNs) and still set
ret.termination_status_, ret.error_status_, and ret.error_message_; in short,
replace the direct call to solution_.get_additional_termination_information(0)
with a guarded branch that verifies existence (or catches the absent-case) and
uses defaults when empty.

In `@cpp/src/pdlp/solve.cu`:
- Around line 1487-1495: The GPU-to-CPU D2H can race with in-flight GPU work
because solve_lp(...) executes on the solver stream but we call
gpu_lp_solution_t::to_cpu_solution() (which does D2H on
rmm::cuda_stream_per_thread) immediately; insert an explicit synchronization of
the stream used by solve_lp (the solver stream variable passed into solve_lp)
after the solve_lp call and before constructing gpu_lp_solution_t / calling
to_cpu_solution(), e.g. call stream.synchronize() or the equivalent CUDA/RMM
stream synchronize on that stream to ensure the GPU work is complete.

In `@cpp/src/pdlp/utilities/cython_solve.cu`:
- Around line 157-174: The warm-start stream resets mutate shared
solver_settings and must be skipped in batch mode to avoid data races; wrap the
resets in an is_batch_mode check (as shown) so that when is_batch_mode is true
you do not call solver_settings->get_pdlp_settings().get_pdlp_warm_start_data()
mutations. Concretely, ensure the code around
solver_settings->get_pdlp_settings().get_pdlp_warm_start_data() first checks if
(!is_batch_mode) before inspecting current_primal_solution_.size() and calling
set_stream(...) on current_primal_solution_, current_dual_solution_,
initial_primal_average_, initial_dual_average_, current_ATY_,
sum_primal_solutions_, sum_dual_solutions_,
last_restart_duality_gap_primal_solution_, and
last_restart_duality_gap_dual_solution_; keep the guard to prevent concurrent
mutations of the shared solver_settings object.

In `@cpp/tests/linear_programming/c_api_tests/c_api_test.c`:
- Around line 1919-2022: In test_cpu_only_execution, guard the
CPU-only/remote-mode run by checking the relevant env vars (CUOPT_REMOTE_HOST
and/or CUDA_VISIBLE_DEVICES) at the start and skip/return early if the test is
not configured for remote simulation; also stop treating malloc(0) as an error
by allocating primal_solution only when num_variables > 0 (i.e., replace the
unconditional malloc(num_variables * sizeof(cuopt_float_t)) + NULL check with a
conditional allocation and only check malloc failure when num_variables > 0),
and ensure subsequent calls that use primal_solution handle the zero-variable
case safely.
- Around line 1855-1860: The current allocation check treats malloc(0) returning
NULL as an error; update the check for dual_solution and reduced_costs so it
only treats a NULL return as failure when the requested size is non‑zero (i.e.,
test (num_constraints > 0 && dual_solution == NULL) and (num_variables > 0 &&
reduced_costs == NULL)); if either condition is true, print the error and goto
DONE. Refer to the dual_solution and reduced_costs allocations and the
num_constraints/num_variables variables in c_api_test.c to implement this fix.

In `@cpp/tests/linear_programming/c_api_tests/c_api_tests.cpp`:
- Around line 352-366: The CPU-only tests (TEST c_api_cpu_only::lp_solve and
mip_solve) currently validate infrastructure rather than numerical correctness
because remote solve returns dummy solutions; leave the tests as-is to ensure
the CPU-only C API path (test_cpu_only_execution, test_cpu_only_mip_execution,
CPUOnlyTestEnvironment) completes without errors, but add a short TODO comment
in each test noting that numerical assertions must be added once the gRPC/real
remote solver replaces the stubs so reviewers can find and update lp_solve and
mip_solve later.

In `@cpp/tests/linear_programming/unit_tests/solution_interface_test.cu`:
- Around line 184-279: Update the three tests
(gpu_problem_to_optimization_problem, cpu_problem_to_optimization_problem, and
mps_data_model_to_optimization_problem) to compare CSR buffers and all bounds
element-wise instead of only sizes: for gpu_problem_to_optimization_problem and
cpu_problem_to_optimization_problem add checks that
concrete_problem.get_constraint_matrix_values()/get_constraint_matrix_indices()/get_constraint_matrix_offsets()
(host-copied via cuopt::host_copy) match the source problem's corresponding host
vectors from populate_from_mps_data_model (or
problem->get_constraint_matrix_*_host()), and also compare upper bounds via
get_variable_upper_bounds_host() vs
host_copy(concrete_problem.get_variable_upper_bounds()). For
mps_data_model_to_optimization_problem similarly compare
csr_values/indices/offsets content against mps_data.get_constraint_matrix_* (or
equivalent accessors) and validate both lower and upper bounds element-wise;
factor repeated logic into a small shared helper (e.g., compare_vectors_on_host)
to avoid duplication.

---

Nitpick comments:
In `@cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp`:
- Around line 60-330: The code repeatedly uses const_cast on solution_ to call
get_pdlp_warm_start_data(), indicating
optimization_problem_solution_t<i_t,f_t>::get_pdlp_warm_start_data() is missing
a const overload; add a const-qualified overload returning const
pdlp_warm_start_data_t<i_t,f_t>& (and keep the non-const one), then update
callers (e.g., methods get_pdlp_warm_start_data(), has_warm_start_data(),
get_current_primal_solution_host(), get_current_dual_solution_host(),
get_initial_primal_average_host(), get_initial_dual_average_host(),
get_current_ATY_host(), get_sum_primal_solutions_host(),
get_sum_dual_solutions_host(),
get_last_restart_duality_gap_primal_solution_host(),
get_last_restart_duality_gap_dual_solution_host(), get_initial_primal_weight(),
get_initial_step_size(), get_total_pdlp_iterations(),
get_total_pdhg_iterations(), get_last_candidate_kkt_score(),
get_last_restart_kkt_score(), get_sum_solution_weight(),
get_iterations_since_last_restart()) to call
solution_.get_pdlp_warm_start_data() directly without const_cast.

In
`@cpp/include/cuopt/linear_programming/optimization_problem_solution_interface.hpp`:
- Around line 138-352: The interface lp_solution_interface_t currently exposes
many individual warm-start accessors (get_current_primal_solution_host,
get_current_dual_solution_host, get_initial_primal_average_host,
get_initial_dual_average_host, get_current_ATY_host,
get_sum_primal_solutions_host, get_sum_dual_solutions_host,
get_last_restart_duality_gap_primal_solution_host,
get_last_restart_duality_gap_dual_solution_host, get_initial_primal_weight,
get_initial_step_size, get_total_pdlp_iterations, get_total_pdhg_iterations,
get_last_candidate_kkt_score, get_last_restart_kkt_score,
get_sum_solution_weight, get_iterations_since_last_restart) which inflates the
vtable; consolidate these by introducing a single POD struct (e.g.,
pdlp_warm_start_host_t) containing all warm-start vectors/scalars and replace
the group of virtual methods with one virtual: virtual pdlp_warm_start_host_t
get_warm_start_data_host() const = 0; update implementations of
lp_solution_interface_t and any derived GPU/CPU solution classes to populate and
return this struct, and remove the now-redundant individual getters to reduce
surface area and simplify future refactors.

In `@cpp/include/cuopt/linear_programming/utilities/cython_solve.hpp`:
- Around line 41-50: Document the ownership semantics for the raw pointers
returned by call_solve_lp and call_solve_mip: add a brief comment in
cython_solve.hpp immediately above the declarations of call_solve_lp and
call_solve_mip stating whether the returned lp_solution_interface_t<int,double>*
and mip_solution_interface_t<int,double>* are owned by the caller (caller must
delete/free) or are borrowed (caller must not delete), and if ownership is
transferred include the preferred deletion method (delete vs custom deleter) and
any lifetime constraints related to problem_interface or solver_settings;
reference the exact symbols call_solve_lp, call_solve_mip,
lp_solution_interface_t, and mip_solution_interface_t in the comment so users
and Cython wrappers know how to manage the returned pointer lifetime.

In `@cpp/src/pdlp/cuopt_c.cpp`:
- Around line 858-916: Wrap the heap allocation of solution_and_stream_view_t in
a std::unique_ptr to make ownership transfer exception-safe: create
std::unique_ptr<solution_and_stream_view_t> tmp(new
solution_and_stream_view_t(...)) in both the MIP branch (where
solution_and_stream_view is used and mip_solution_interface_ptr is set) and the
LP branch (where lp_solution_interface_ptr is set), move any setup (assigning
mip_solution_interface_ptr/lp_solution_interface_ptr and calling printTimestamp)
while owned by tmp, then release tmp into *solution_ptr (i.e. *solution_ptr =
static_cast<cuOptSolution>(tmp.release())); this preserves existing behavior but
ensures no leak if an exception is introduced between new and assignment.

In `@cpp/src/pdlp/gpu_optimization_problem.cu`:
- Around line 114-131: In
gpu_optimization_problem_t<i_t,f_t>::set_quadratic_objective_matrix the function
presently assumes Q_values/Q_indices/Q_offsets are host pointers and copies them
into Q_values_, Q_indices_, Q_offsets_ using std::copy, but the
validate_positive_semi_definite parameter is accepted and unused; update this
function by either adding a clear inline comment stating that the inputs are
host memory and therefore copied with std::copy (or documenting this behavior in
the header), or add a TODO/NOT_IMPLEMENTED comment referencing
validate_positive_semi_definite to indicate planned PSD validation (or implement
validation) so the unused parameter is explicit; reference the method name and
the members Q_values_, Q_indices_, Q_offsets_ when making the change.
- Around line 957-1026: Add defensive precondition checks using cuopt_expects
before each cudaMemcpy to ensure the caller-provided sizes do not exceed the
device buffers: in copy_objective_coefficients_to_host check size <= c_.size();
in copy_constraint_matrix_to_host check num_values <= A_.size(), num_indices <=
A_indices_.size(), num_offsets <= A_offsets_.size(); in copy_row_types_to_host
check size <= row_types_.size(); in copy_constraint_bounds_to_host check size <=
b_.size(); in copy_constraint_lower_bounds_to_host and
copy_constraint_upper_bounds_to_host check size <=
constraint_lower_bounds_.size() / constraint_upper_bounds_.size() respectively;
in copy_variable_lower_bounds_to_host and copy_variable_upper_bounds_to_host
check size <= variable_lower_bounds_.size() / variable_upper_bounds_.size(); and
in copy_variable_types_to_host check size <= variable_types_.size(); after each
cuopt_expects guard keep the existing RAFT_CUDA_TRY(cudaMemcpy(...)) calls.

In `@cpp/src/pdlp/utilities/cython_solve.cu`:
- Around line 204-233: compute_max_thread currently unconditionally calls
cudaMemGetInfo (via RAFT_CUDA_TRY) which will fail in CPU-only environments;
wrap the GPU-specific logic with a CUDA-availability guard (e.g., call
cudaGetDeviceCount or use an existing RAFT/compile-time macro) and skip
cudaMemGetInfo when no device is present. If no GPU is available, compute a safe
fallback for res (for example, std::min(max_total, (int)data_models.size()) or a
value based solely on data_models without querying GPU memory) and return that;
otherwise run the existing memory-based calculation. Ensure you only call
cudaMemGetInfo inside the guarded branch in compute_max_thread and preserve the
cuopt_expects check for the GPU path.

remove caching of host solution in gpu solution structure for
get__xxx_host() calls because it was overly complicated and
in practice only test code benefitted


def validate_variable_bounds(data, settings, solution):
from cuopt.linear_programming.solver.solver_parameters import (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi this is to resolve a circular dependency



def check_solution(data, setting, solution, cost):
from cuopt.linear_programming.solver.solver_parameters import (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

}

// ============================================================================
// Remote execution for GPU problems (converts to CPU then calls CPU remote)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, kept this path because someday we may want to support a case for GPU manipulation of a problem and/or solution on a local GPU and remote solve on a more capable GPU. This mode is not available by default, it would need explicit setting of an env var.

mip_solver_settings_t<i_t, f_t> const& settings) const = 0;

// ============================================================================
// C API Support: Copy to Host (Polymorphic)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking we don't need these copy_*_to_host() versions, we could use the getter form in the cuopt_c.cpp and add a std::memcopy. One more line of code and a trivial copy versus 9 interface methods and 18 implementations. Inclined to remove, what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change P0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants