Skip to content

Comments

Refactor NSS and DNSS implementations for improved readability and modularity#8

Open
eclipse0922 wants to merge 2 commits intomasterfrom
refactor_2
Open

Refactor NSS and DNSS implementations for improved readability and modularity#8
eclipse0922 wants to merge 2 commits intomasterfrom
refactor_2

Conversation

@eclipse0922
Copy link
Owner

@eclipse0922 eclipse0922 commented Feb 8, 2026

User description

  • Rewrote legacy code into clean C++17 style.
  • Added missing method implementations and input validation.
  • Introduced options struct for configurable parameters in NSS and DNSS.
  • Implemented CUDA support for DNSS rotational feature computation.
  • Added new CMake configuration for building with or without CUDA.
  • Updated README to reflect changes and provide build instructions.
  • Added .gitignore to exclude build directories.

PR Type

Enhancement, Bug fix, Documentation


Description

  • Refactored NSS and DNSS implementations into clean C++17 with modular design and options structs.

  • Added input validation, missing methods, and fixed rotational-return math to use radians consistently.

  • Implemented optional CUDA acceleration for DNSS rotational feature computation with CPU fallback.

  • Updated README with algorithm notes, build instructions, and references; added CMake CUDA support.


Changes walkthrough 📝

Relevant files
Enhancement
NSS.cpp
Refactor sampling logic and add helper functions                 

NSS.cpp

  • Replaced legacy bucket-based sampling with modular helper functions
    and options-based configuration.
  • Implemented computeCenteredAndScaledVertices, sphericalBucketIndex,
    and computeRotationalReturnValue.
  • Added CUDA integration point via DNSSComputeRotationalFeaturesCuda
    declaration.
  • Rewrote normalSpaceSampling and dualNormalSpaceSampling with lazy
    removal and priority queues.
  • +545/-310
    NSS.h
    Modernize header and add options                                                 

    NSS.h

  • Replaced legacy header with clean C++17 interface, removing Eigen and
    concurrency dependencies.
  • Introduced Options structs for configurable parameters in both NSS and
    DNSS.
  • Added setUseCuda, getUseCuda, and input validation in setInputCloud.
  • Added minimal glm::fvec3 fallback implementation for portability.
  • +162/-123
    dnss_cuda.cu
    Add CUDA implementation for DNSS                                                 

    dnss_cuda.cu

  • Implemented CUDA kernel for rotational feature computation with
    host-side wrapper.
  • Added memory management, error checking, and fallback to CPU path on
    failure.
  • Supports batch processing of normalized vertices and normals.
  • +217/-0 
    Configuration changes
    CMakeLists.txt
    Add CMake CUDA build support                                                         

    CMakeLists.txt

  • Added CMake configuration with optional CUDA support via
    DNSS_ENABLE_CUDA option.
  • Configured CUDA language, architectures, and linking with cudart.
  • Enabled separable compilation and device symbol resolution for CUDA
    targets.
  • +21/-0   
    pr-agent.yml
    Update PR agent config                                                                     

    .github/workflows/pr-agent.yml

  • Increased PR_REVIEWER__NUM_CODE_SUGGESTIONS from 3 to 5.
  • Removed PR_DESCRIPTION__PUBLISH_LABELS line.
  • +2/-2     
    Documentation
    README.md
    Update documentation and build guide                                         

    README.md

  • Updated README with algorithm notes, build instructions (CPU/CUDA),
    and references.
  • Clarified fixes: C++17 rewrite, validation, radians, lazy removal,
    CUDA fallback.
  • Added links to DNSS paper and repository.
  • +50/-10 

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • …dularity
    
    - Rewrote legacy code into clean C++17 style.
    - Added missing method implementations and input validation.
    - Introduced options struct for configurable parameters in NSS and DNSS.
    - Implemented CUDA support for DNSS rotational feature computation.
    - Added new CMake configuration for building with or without CUDA.
    - Updated README to reflect changes and provide build instructions.
    - Added .gitignore to exclude build directories.
    @chatgpt-codex-connector
    Copy link

    You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

    @gemini-code-assist
    Copy link

    Summary of Changes

    Hello @eclipse0922, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

    This pull request undertakes a significant modernization and enhancement of the Normal Space Sampling (NSS) and Dual Normal Space Sampling (DNSS) algorithms. The codebase has been extensively refactored to align with C++17 standards, prioritizing clarity and modularity. Key improvements include the introduction of flexible configuration options, robust input validation, and a substantial performance boost for DNSS through optional CUDA acceleration. The underlying sampling logic has been optimized for efficiency, and the project's build process and documentation have been updated to reflect these comprehensive changes.

    Highlights

    • Code Refactoring and Modernization: Refactored both NSS and DNSS implementations to modern C++17, enhancing readability, modularity, and maintainability by replacing legacy code patterns with contemporary C++ features.
    • Configurable Options: Introduced Options structs for both NSS and DNSS, allowing users to configure parameters such as bucket sizes, random seeds, and CUDA enablement, providing greater flexibility and control over the sampling process.
    • CUDA Acceleration for DNSS: Implemented optional CUDA acceleration for the DNSS rotational feature computation, which can significantly improve performance for large datasets by offloading intensive calculations to the GPU. The implementation includes a fallback to CPU if CUDA execution fails.
    • Improved Algorithm Logic and Validation: Added comprehensive input validation to prevent common errors and corrected mathematical inconsistencies (e.g., ensuring consistent use of radians) within the sampling algorithms. The DNSS sampling logic was also refined using an active-mask and priority queue for more efficient point selection, replacing less performant erase operations.
    • Build System and Documentation Updates: Updated the build system with a new CMakeLists.txt to support optional CUDA compilation and added a .gitignore for common build artifacts. The README.md was thoroughly revised to provide clear documentation on the refactoring, new features, build instructions, and algorithmic details.
    Changelog
    • .gitignore
      • Added build/ and build-cuda/ entries to exclude generated build directories from version control.
    • CMakeLists.txt
      • Introduced a new CMake configuration for the project, setting the minimum required version to 3.20.
      • Added an option DNSS_ENABLE_CUDA to conditionally enable CUDA support for DNSS.
      • Configured the project to compile with the C++17 standard (cxx_std_17).
      • Integrated CUDA language support, CUDAToolkit finding, and linked CUDA::cudart when DNSS_ENABLE_CUDA is enabled.
      • Set CUDA_SEPARABLE_COMPILATION and CUDA_RESOLVE_DEVICE_SYMBOLS properties for the dnss target when CUDA is active.
    • NSS.cpp
      • Performed extensive refactoring of both NSS and DNSS implementations, moving utility functions (e.g., clampFloat, dotProduct, crossProduct, vectorLength, normalizeSafe, sphericalBucketIndex, computeCenteredAndScaledVertices) into an anonymous namespace for better encapsulation.
      • Rewrote NSS::normalSpaceSampling to utilize the new NSS::Options struct, std::mt19937 for random shuffling, and a more robust bucket selection mechanism.
      • Rewrote DNSS::dualNormalSpaceSampling to incorporate DNSS::Options, support CUDA acceleration, and implement a priority queue-based selection with an active-mask for improved efficiency, replacing previous concurrency::parallel_for and std::remove_if patterns.
      • Added constructors and setOptions/getOptions methods for both NSS and DNSS classes.
      • Implemented input validation for setInputCloud methods to ensure matching vertex and normal counts.
      • Ensured consistent use of radians for all angular computations within the algorithms.
    • NSS.h
      • Removed legacy includes such as <glm/gtx/norm.hpp>, <iostream>, <ppl.h>, <Eigen/Dense>, <Eigen/Geometry>, <Eigen/Core>, and various concurrency primitives.
      • Added a fallback glm::fvec3 definition for environments where GLM headers might not be directly available, ensuring broader compatibility.
      • Defined NSS::Options and DNSS::Options structs to encapsulate configurable parameters for each class, including bucket dimensions, random seeds, and CUDA enablement.
      • Updated class interfaces to use the new Options structs and removed old member variables related to bucket management and hardcoded constants.
      • Updated method signatures to reflect the refactoring and new parameter passing conventions.
    • README.md
      • Updated the title to "DNSS / NSS" to better reflect the content.
      • Added a "What This Version Fixes" section detailing the refactoring, validation, mathematical corrections, efficient removal strategies, and CUDA support.
      • Included an "Algorithm Notes" section explaining the core DNSS process, including point centering/scaling, normal space construction, rotational return computation, and iterative point selection, along with default bucket settings.
      • Provided updated build instructions for both CPU-only and CUDA-enabled compilation using CMake.
      • Added references to the original DNSS paper and related works for context.
    • dnss_cuda.cu
      • Added a new CUDA source file dnss_cuda.cu.
      • Implemented the computeRotationalFeaturesKernel CUDA kernel to perform rotational normal and rotational return computations on the GPU.
      • Included a host-side wrapper function DNSSComputeRotationalFeaturesCuda responsible for managing data transfer between host and device, launching the CUDA kernel, and handling error checking.
      • Defined __device__ utility functions like clampFloat for use within the kernel.
    Ignored Files
    • Ignored by pattern: .github/workflows/** (1)
      • .github/workflows/pr-agent.yml
    Using Gemini Code Assist

    The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

    Invoking Gemini

    You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

    Feature Command Description
    Code Review /gemini review Performs a code review for the current pull request in its current state.
    Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
    Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
    Help /gemini help Displays a list of available commands.

    Customization

    To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

    Limitations & Feedback

    Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

    You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

    Footnotes

    1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    PR Description updated to latest commit (4bb5106)

    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Memory leak in CUDA error paths

    In the CUDA implementation (dnss_cuda.cu), when memory allocation or data transfer fails after initial allocations, the code attempts to free the allocated memory but does not set the pointers to nullptr before returning. This can lead to double-free issues if the function is called again or if the caller attempts to free the same pointers. Additionally, in some error paths (e.g., line 153-156), the code calls cudaFree on potentially uninitialized pointers if earlier allocations failed, which is undefined behavior.

    }
    
    glm::fvec3 centroid(0.0f, 0.0f, 0.0f);
    for (const glm::fvec3 &vertex : vertices)
    {
    	centroid += vertex;
    }
    centroid /= static_cast<float>(vertices.size());
    
    float max_radius = 0.0f;
    for (std::size_t i = 0; i < vertices.size(); ++i)
    Inconsistent epsilon handling

    The code uses different epsilon values (kEpsilon = 1e-7f in CPU code vs 1e-7f in CUDA code) and inconsistent checks for finite values and zero norms. In computeRotationalReturnValue (lines 44-48), the check norm <= kEpsilon returns a default normal (0, 0, 1), which may not be appropriate for all cases and could introduce bias. The CUDA version (lines 52-55) sets normal_norm = 1.0f instead of returning early, which is inconsistent with the CPU implementation and may affect numerical stability.

    	if (!std::isfinite(norm) || norm <= kEpsilon)
    	{
    		return glm::fvec3(0.0f, 0.0f, 1.0f);
    	}
    	return value / norm;
    }
    
    float computeRotationalReturnValue(const glm::fvec3 &normalized_point, const glm::fvec3 &normal, float theta)
    {
    	const float point_norm = vectorLength(normalized_point);
    	if (!std::isfinite(point_norm) || point_norm <= kEpsilon)
    	{
    Missing CUDA stream support

    The CUDA implementation uses the default stream (synchronous) for all operations, which may not be optimal for performance. For better performance and to allow concurrent execution with other CUDA operations, the implementation should support custom streams. Additionally, there is no error handling for the kernel launch itself (line 172), only for subsequent CUDA API calls.

    const int threads = 256;
    const int blocks = (point_count + threads - 1) / threads;
    computeRotationalFeaturesKernel<<<blocks, threads>>>(
    	d_vertices,
    	d_normals,
    	theta_for_return_radians,
    	d_rot_normals,
    	d_rot_returns,
    	point_count);
    
    if (!isCudaSuccess(cudaGetLastError()) || !isCudaSuccess(cudaDeviceSynchronize()))
    {
    	cudaFree(d_vertices);
    	cudaFree(d_normals);
    	cudaFree(d_rot_normals);
    	cudaFree(d_rot_returns);
    	return false;
    }

    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Impact
    General
    Use std::clamp for cleaner clamping logic

    The clampFloat function should use std::clamp for better readability and correctness
    in C++17. This ensures the value is clamped between min and max without potential
    issues with nested std::min/std::max.

    NSS.cpp [18-21]

     float clampFloat(float value, float min_value, float max_value)
     {
    -	return std::max(min_value, std::min(value, max_value));
    +	return std::clamp(value, min_value, max_value);
     }
    Suggestion importance[1-10]: 7

    __

    Why: Using std::clamp is more idiomatic in C++17 and improves readability. However, the original code is functionally correct, so the improvement is moderate in impact.

    Medium
    Rename function for clarity and consistency

    The function name computeRotationalReturnValue is misleading since it computes a
    rotational return value based on both the point and normal, not just the normalized
    point. Rename it to computeRotationalReturn for clarity and consistency with the
    algorithm's terminology.

    NSS.cpp [51-89]

    -float computeRotationalReturnValue(const glm::fvec3 &normalized_point, const glm::fvec3 &normal, float theta)
    +float computeRotationalReturn(const glm::fvec3 &normalized_point, const glm::fvec3 &normal, float theta)
     {
     	const float point_norm = vectorLength(normalized_point);
     	if (!std::isfinite(point_norm) || point_norm <= kEpsilon)
     	{
     		return 0.0f;
     	}
    Suggestion importance[1-10]: 6

    __

    Why: Renaming computeRotationalReturnValue to computeRotationalReturn improves consistency with the algorithm's terminology and clarifies the function's purpose. This is a stylistic improvement with no functional impact.

    Low

    Copy link

    @gemini-code-assist gemini-code-assist bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Code Review

    This is an excellent and comprehensive refactoring of the NSS and DNSS implementations. The code is now much more readable, modular, and follows modern C++17 practices.
    Key improvements include:

    • The introduction of Options structs for both NSS and DNSS, which makes configuration much cleaner.
    • A complete rewrite of the core algorithms, replacing complex and inefficient legacy code with clear, robust, and more efficient implementations. For example, using an is_active flag instead of vector::erase is a great performance improvement.
    • The addition of optional CUDA support with a clean CPU fallback is a fantastic feature.
    • The new CMake build system is well-structured.

    I have a few minor suggestions to further improve the code, mostly related to best practices and code simplification. Overall, this is a very high-quality pull request.


    option(DNSS_ENABLE_CUDA "Enable CUDA acceleration for DNSS rotational feature computation" OFF)

    add_library(dnss NSS.cpp NSS.h)

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    medium

    It's generally not recommended to include header files in add_library. While it might work with some generators (like adding the file to a Visual Studio project), it's not a portable practice and can have unintended side effects with others. Headers are made available to targets via target_include_directories, which is already correctly used.

    add_library(dnss NSS.cpp)
    

    Comment on lines +594 to 609
    if (sampled_vertices.size() < static_cast<std::size_t>(target_count))
    {
    for (std::size_t point_index = 0; point_index < is_active.size(); ++point_index)
    {
    if (sampled_vertices.size() >= static_cast<std::size_t>(target_count))
    {
    break;
    }
    if (is_active[point_index] == 0u)
    {
    continue;
    }
    selectPoint(static_cast<int>(point_index));
    }
    }
    return pid;
    }

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    medium

    This final loop appears to be redundant. The main sampling loop preceding this will terminate either when target_count is reached or when all available points have been sampled (leading to best_type == BucketType::None). In the latter scenario, all points would have been sampled, meaning this loop would not select any additional points. This block of code seems safe to remove.

    Comment on lines +153 to +156
    cudaFree(d_vertices);
    cudaFree(d_normals);
    cudaFree(d_rot_normals);
    cudaFree(d_rot_returns);

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    medium

    This block of cleanup code is repeated in multiple error-handling paths. To improve maintainability and reduce redundancy, consider extracting it into a helper lambda or, for a more robust solution, using a RAII wrapper (like std::unique_ptr with a custom deleter) for the CUDA memory allocations. This would simplify the error handling logic significantly.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant