-
Notifications
You must be signed in to change notification settings - Fork 128
This change will fix configuration issues on HiperGator #1112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughThe hipergator toolchain configuration is updated with new consolidated module declarations (h-all, h-cpu, h-gpu) featuring upgraded versions: CUDA 12.9.1, NVHPC 25.9, and openmpi/5.0.7. GPU resource specifications are adjusted with reduced CPU allocation and added memory configuration. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
| % if gpu_enabled: | ||
| #SBATCH --gpus-per-task=1 | ||
| #SBATCH --cpus-per-task=3 | ||
| #SBATCH --gpu-bind=closest | ||
| #SBATCH --mem-per-cpu=50GB | ||
| % endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Add an else block to the gpu_enabled check to set --cpus-per-task=7 for CPU-only jobs, restoring the intended behavior. [general, importance: 8]
| % if gpu_enabled: | |
| #SBATCH --gpus-per-task=1 | |
| #SBATCH --cpus-per-task=3 | |
| #SBATCH --gpu-bind=closest | |
| #SBATCH --mem-per-cpu=50GB | |
| % endif | |
| % if gpu_enabled: | |
| #SBATCH --gpus-per-task=1 | |
| #SBATCH --cpus-per-task=3 | |
| #SBATCH --mem-per-cpu=50GB | |
| % else: | |
| #SBATCH --cpus-per-task=7 | |
| % endif |
Nitpicks 🔍
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates HiperGator cluster configuration to properly support CPU and GPU workloads. The changes replace the previous CUDA/OpenMPI environment setup with a cleaner module-based approach and adjust SLURM resource allocation parameters.
Changes:
- Complete rewrite of the HiperGator module configuration in
toolchain/modules, replacing hardcoded environment variables with module loads and properly separated CPU/GPU configurations - Updates to SLURM resource requests in
hipergator.mako, including GPU-specific CPU and memory settings and hardcoded MPI launcher path
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| toolchain/modules | Rewrites hipergator (h) module configuration with distinct CPU (gcc/openmpi) and GPU (nvhpc/cuda/openmpi) setups |
| toolchain/templates/hipergator.mako | Removes global cpus-per-task setting, adds GPU-specific resource requests, and hardcodes nvhpc mpirun path |
| % else: | ||
| (set -x; ${profiler} \ | ||
| mpirun -np ${nodes*tasks_per_node} \ | ||
| /apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpirun -np ${nodes*tasks_per_node} \ |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded absolute path to mpirun creates a tight coupling to a specific NVHPC version (25.9) and installation location. This path is used unconditionally for both GPU and CPU modes, but the CPU configuration in toolchain/modules uses gcc/openmpi which would have a different mpirun path. Consider either: (1) using a conditional path based on gpu_enabled to use the appropriate MPI launcher for each mode, or (2) relying on the PATH environment variable set by the module system (like other cluster templates do) by simply using mpirun.
| /apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpirun -np ${nodes*tasks_per_node} \ | |
| mpirun -np ${nodes*tasks_per_node} \ |
| #SBATCH --output="${name}.out" | ||
| #SBATCH --time=${walltime} | ||
| #SBATCH --cpus-per-task=7 | ||
| % if gpu_enabled: |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cpus-per-task setting is now only applied to GPU jobs (inside the % if gpu_enabled: block), but CPU jobs have no cpus-per-task specification. This asymmetry could lead to suboptimal CPU-only job configurations. Consider whether CPU jobs should also have an explicit cpus-per-task setting, or document why GPU jobs specifically need 3 CPUs per task while CPU jobs use the default.
| % if gpu_enabled: | |
| % if gpu_enabled: | |
| # Note: For GPU jobs, we explicitly request 1 GPU and 3 CPUs per task. | |
| # CPU-only jobs rely on the cluster's default cpus-per-task setting. |
| #SBATCH --gpus-per-task=1 | ||
| #SBATCH --cpus-per-task=3 | ||
| #SBATCH --gpu-bind=closest | ||
| #SBATCH --mem-per-cpu=50GB |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
50GB memory per CPU is extremely high (150GB total for 3 CPUs per task). This could severely limit job scheduling on the cluster. Verify this is the intended memory requirement and not a typo (perhaps 50GB total or 5GB per CPU was intended). Most GPU codes require much less CPU memory unless doing significant host-side preprocessing.
| #SBATCH --mem-per-cpu=50GB | |
| #SBATCH --mem-per-cpu=5GB |
| % else: | ||
| (set -x; ${profiler} \ | ||
| mpirun -np ${nodes*tasks_per_node} \ | ||
| /apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpirun -np ${nodes*tasks_per_node} \ | ||
| --bind-to none \ | ||
| "${target.get_install_binpath(case)}") | ||
| % endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: The absolute MPI launcher path contains a duplicated version segment ("25.9" appears twice) which very likely makes the path incorrect and the mpirun binary unavailable at runtime; fix the path to the correct single-version location so the launcher exists on the nodes. [possible bug]
Severity Level: Critical 🚨
- ❌ MPI jobs fail to start on affected nodes.
- ⚠️ Distributed test runs do not execute.
- ⚠️ Affects template-driven MPI launches in CI and local runs.| % else: | |
| (set -x; ${profiler} \ | |
| mpirun -np ${nodes*tasks_per_node} \ | |
| /apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpirun -np ${nodes*tasks_per_node} \ | |
| --bind-to none \ | |
| "${target.get_install_binpath(case)}") | |
| % endif | |
| /apps/compilers/nvhpc/25.9/Linux_x86_64/comm_libs/mpi/bin/mpirun -np ${nodes*tasks_per_node} \ |
Steps of Reproduction ✅
1. Trigger an MPI run path by rendering toolchain/templates/hipergator.mako with a target
where mpi==True. The template enters the else branch shown at lines 50-55 and emits a
command containing the absolute mpirun path at line 52
("/apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpirun").
2. Submit the generated job or run the script on a compute node so the template-expressed
command executes. This is the normal execution path for distributed runs using this
template (the for-loop at lines 45-60 iterates targets and executes this branch when mpi
is enabled).
3. When the node shell attempts to execute the absolute path, the duplicated "25.9"
segment makes the path incorrect on nodes where the real NVHPC installation path does not
contain that duplicated segment. The shell prints "No such file or directory" and the MPI
launch fails immediately.
4. Observe the job failing to start distributed processes; the failure is reproducible by
running any MPI-targeted job using this template (mpi==True) because the template emits
the incorrect absolute path at toolchain/templates/hipergator.mako:52.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** toolchain/templates/hipergator.mako
**Line:** 50:55
**Comment:**
*Possible Bug: The absolute MPI launcher path contains a duplicated version segment ("25.9" appears twice) which very likely makes the path incorrect and the mpirun binary unavailable at runtime; fix the path to the correct single-version location so the launcher exists on the nodes.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI finished reviewing your PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@toolchain/modules`:
- Around line 88-93: Update the HiperGator GPU module env vars so they match
other GPU clusters: replace the hardcoded MPI wrapper paths in CC, CXX, and FC
with either the NVHPC compiler commands (CC=nvc, CXX=nvc++, FC=nvfortran) so the
build uses the NVHPC compilers, or—if MPI wrappers are required—use the generic
wrapper commands (mpicc, mpicxx, mpifort) instead of absolute paths; also ensure
NVCOMPILER_COMM_LIBS_HOME remains pointed to the correct NVHPC comm libs
location (symbol NVCOMPILER_COMM_LIBS_HOME) so the MPI libs are found by the
toolchain.
- Around line 88-93: Add the environment variable MFC_CUDA_CC=100 to the h-gpu
stanza so the configuration targets SM100 (CUDA compute capability 10.0); update
the block that defines h-gpu (the lines setting UCX_NET_DEVICES, cuda/12.9.1,
nvhpc/25.9, openmpi/5.0.7 and the CC/CXX/FC/NVCOMPILER_COMM_LIBS_HOME vars) to
include MFC_CUDA_CC=100 alongside those existing environment definitions.
🧹 Nitpick comments (2)
toolchain/templates/hipergator.mako (1)
51-54: Hardcoded absolute path tompirunis brittle and creates maintenance burden.The hardcoded path
/apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpiruntightly couples this template to a specific NVHPC version. When NVHPC is upgraded, both this file andtoolchain/modulesmust be updated in lockstep.Consider using the system's
mpirunfromPATHafter modules are loaded, or deriving the path from an environment variable set intoolchain/modules.♻️ Suggested approach using environment variable
In
toolchain/modules, add an environment variable for the MPI bin path:h-gpu NVCOMPILER_COMM_LIBS_HOME=/apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/12.9 +h-gpu MFC_MPIRUN=/apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpirunThen in this template:
- /apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpirun -np ${nodes*tasks_per_node} \ + $MFC_MPIRUN -np ${nodes*tasks_per_node} \toolchain/modules (1)
90-90: Minor: Trailing whitespace on line 90.🧹 Remove trailing whitespace
-h-gpu CC=/apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpicc +h-gpu CC=/apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpicc
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
toolchain/modulestoolchain/templates/hipergator.mako
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Ensure GPU code compiles with Cray ftn, NVIDIA nvfortran, GNU gfortran, and Intel ifx/ifort compilers
Applied to files:
toolchain/templates/hipergator.makotoolchain/modules
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Wrap tight loops with `$:GPU_PARALLEL_FOR(private='[...]', copy='[...]')` macro; add `collapse=n` for safe nested loop merging
Applied to files:
toolchain/templates/hipergator.makotoolchain/modules
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Do not use OpenACC or OpenMP directives directly; use Fypp macros from `src/common/include/parallel_macros.fpp` instead
Applied to files:
toolchain/templates/hipergator.mako
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Wrap tight GPU loops with !$acc parallel loop gang vector default(present) reduction(...); add collapse(n) when safe; declare loop-local variables with private(...)
Applied to files:
toolchain/templates/hipergator.mako
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to **/*.{fpp,f90} : Use `private` declaration followed by explicit `public` exports in modules
Applied to files:
toolchain/modules
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with GNU `gfortran` and Intel `ifx`/`ifort` for portability
Applied to files:
toolchain/modules
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🔇 Additional comments (2)
toolchain/modules (1)
86-87: LGTM on common and CPU module declarations.The module versions (Python 3.12, GCC 14.2, OpenMPI 5.0.7) are reasonable and consistent with the modernization of other cluster configurations in this file.
toolchain/templates/hipergator.mako (1)
11-16: Remove claim about "reduced" CPU allocation—this is a new HiperGator configuration, not a modification.The resource specifications align with HiPerGator B200 partition limits (112 CPUs/node, 2TB RAM/node, 8 GPUs/node): 3 cpus-per-task with 50GB mem-per-cpu (150GB total per GPU task) allocates ~60% of the per-GPU host memory average, leaving reasonable headroom within the 2TB/node constraint.
Likely an incorrect or invalid review comment.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| h-gpu UCX_NET_DEVICES="mlx5_4:1,mlx5_7:1,mlx5_8:1,mlx5_9:1,mlx5_10:1,mlx5_13:1,mlx5_14:1,mlx5_15:1" | ||
| h-gpu cuda/12.9.1 nvhpc/25.9 openmpi/5.0.7 | ||
| h-gpu CC=/apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpicc | ||
| h-gpu CXX=/apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpicxx | ||
| h-gpu FC=/apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpifort | ||
| h-gpu NVCOMPILER_COMM_LIBS_HOME=/apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/12.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the toolchain/modules file around the mentioned lines
fd "modules" toolchain/ -type fRepository: MFlowCode/MFC
Length of output: 227
🏁 Script executed:
# Get the full context around lines 88-93 and see other cluster configurations
cat -n toolchain/modules | head -100Repository: MFlowCode/MFC
Length of output: 3802
🏁 Script executed:
# Check how CC/CXX/FC are used in the build system
rg "(\$CC|\$CXX|\$FC|CC=|CXX=|FC=)" --type cmake -A 2 | head -50Repository: MFlowCode/MFC
Length of output: 356
🏁 Script executed:
# Check CMakeLists.txt for how it uses CC/CXX/FC
cat toolchain/modules | grep -A 10 -B 5 "hipergator\|h-gpu"Repository: MFlowCode/MFC
Length of output: 665
🏁 Script executed:
# Search for HiperGator-specific build configuration or comments
rg -i "hipergator|h-gpu|mpicc|mpifort" . -A 2 -B 2 --max-count 20Repository: MFlowCode/MFC
Length of output: 1340
🏁 Script executed:
# Check if there are any cmake files that handle compiler selection
fd -e cmake | head -5Repository: MFlowCode/MFC
Length of output: 210
🏁 Script executed:
# Look at the CMakeLists.txt to understand compiler handling
head -100 CMakeLists.txtRepository: MFlowCode/MFC
Length of output: 4519
HiperGator GPU configuration deviates from all other clusters by explicitly setting MPI wrapper paths instead of base NVHPC compilers.
All other GPU clusters (Bridges2, Ascent, Wombat, Expanse, Phoenix, Delta, Oscar, Nautilus) use CC=nvc CXX=nvc++ FC=nvfortran, while HiperGator sets these to explicit MPI wrapper paths. This is inconsistent and problematic:
- GPU compilation expectation: Learning notes indicate GPU code should compile with
nvfortran, not a wrapper - Build system assumptions: CMakeLists.txt sets the compiler via environment variables and detects the compiler ID; MPI wrapper paths may not be properly identified as NVHPC
- Portability: MPI wrappers bundled to a specific installation path are less portable than using wrapper commands in PATH
If MPI wrappers are necessary for HiperGator's build, consider using the wrapper commands (mpicc, mpicxx, mpifort) instead of hardcoded paths, aligning with the pattern used by other clusters.
🤖 Prompt for AI Agents
In `@toolchain/modules` around lines 88 - 93, Update the HiperGator GPU module env
vars so they match other GPU clusters: replace the hardcoded MPI wrapper paths
in CC, CXX, and FC with either the NVHPC compiler commands (CC=nvc, CXX=nvc++,
FC=nvfortran) so the build uses the NVHPC compilers, or—if MPI wrappers are
required—use the generic wrapper commands (mpicc, mpicxx, mpifort) instead of
absolute paths; also ensure NVCOMPILER_COMM_LIBS_HOME remains pointed to the
correct NVHPC comm libs location (symbol NVCOMPILER_COMM_LIBS_HOME) so the MPI
libs are found by the toolchain.
🧩 Analysis chain
🌐 Web query:
NVIDIA B200 GPU CUDA compute capability
💡 Result:
The NVIDIA B200 (Blackwell SM100) has CUDA compute capability 10.0 (target arches sm100a / sm100f); it requires CUDA toolkit support beginning with CUDA 12.8. [1][2]
🏁 Script executed:
# Search for MFC_CUDA_CC in the toolchain/modules file
grep -n "MFC_CUDA_CC" toolchain/modules | head -20Repository: MFlowCode/MFC
Length of output: 292
Add MFC_CUDA_CC=100 for B200 GPU compute capability targeting.
All other GPU-enabled clusters specify MFC_CUDA_CC to target specific GPU architectures (e.g., line 41 for Phoenix: MFC_CUDA_CC=70,75,80,89,90, line 53 for Delta: MFC_CUDA_CC=80,86). The B200 GPU has CUDA compute capability 10.0 (SM100 architecture) and requires CUDA 12.8 or later; the h-gpu configuration should include MFC_CUDA_CC=100 for consistent GPU targeting.
🤖 Prompt for AI Agents
In `@toolchain/modules` around lines 88 - 93, Add the environment variable
MFC_CUDA_CC=100 to the h-gpu stanza so the configuration targets SM100 (CUDA
compute capability 10.0); update the block that defines h-gpu (the lines setting
UCX_NET_DEVICES, cuda/12.9.1, nvhpc/25.9, openmpi/5.0.7 and the
CC/CXX/FC/NVCOMPILER_COMM_LIBS_HOME vars) to include MFC_CUDA_CC=100 alongside
those existing environment definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 2 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="toolchain/templates/hipergator.mako">
<violation number="1" location="toolchain/templates/hipergator.mako:52">
P2: Hardcoded NVHPC mpirun is used even in CPU mode, mismatching the loaded OpenMPI stack and risking missing binary or MPI runtime failures for CPU MPI jobs.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| % else: | ||
| (set -x; ${profiler} \ | ||
| mpirun -np ${nodes*tasks_per_node} \ | ||
| /apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpirun -np ${nodes*tasks_per_node} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Hardcoded NVHPC mpirun is used even in CPU mode, mismatching the loaded OpenMPI stack and risking missing binary or MPI runtime failures for CPU MPI jobs.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At toolchain/templates/hipergator.mako, line 52:
<comment>Hardcoded NVHPC mpirun is used even in CPU mode, mismatching the loaded OpenMPI stack and risking missing binary or MPI runtime failures for CPU MPI jobs.</comment>
<file context>
@@ -48,7 +49,7 @@ echo
% else:
(set -x; ${profiler} \
- mpirun -np ${nodes*tasks_per_node} \
+ /apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpirun -np ${nodes*tasks_per_node} \
--bind-to none \
"${target.get_install_binpath(case)}")
</file context>
| /apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpirun -np ${nodes*tasks_per_node} \ | |
| ${'/apps/compilers/nvhpc/25.9/Linux_x86_64/25.9/comm_libs/mpi/bin/mpirun' if gpu_enabled else 'mpirun'} -np ${nodes*tasks_per_node} \ | |
User description
User description
Description
These are changes to the toolchain files (module and hipergator.mako) that establish the proper environment for run MFC on HiperGator CPUs and GPUs
Type of change
Scope
If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration
Ran the entire test suite (-a). Only had issue with test 6F35CD77: 3D -> Bubbles -> bubble_model=2
Variable n°25 (1-indexed) in D/cons.13.00.000050.dat is not within tolerance:
Diagnostics - Maximum absolute error among FAILING variables:
Diagnostics - Maximum relative error among FAILING variables:
Test Configuration:
Ran on two HiperGator partitions: hpg-default and hpg-b200.
Checklist
docs/)examples/that demonstrate my new feature performing as expected.They run to completion and demonstrate "interesting physics"
./mfc.sh formatbefore committing my codeIf your code changes any code source files (anything in
src/simulation)To make sure the code is performing as expected on GPU devices, I have:
nvtxranges so that they can be identified in profiles./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.PR Type
Enhancement
Description
Update HiperGator SLURM configuration for GPU and CPU tasks
Migrate to NVHPC 25.9 with CUDA 12.9.1 and OpenMPI 5.0.7
Configure proper MPI paths and UCX network devices for GPU
Adjust CPU allocation and memory settings for optimal performance
Diagram Walkthrough
File Walkthrough
hipergator.mako
Update SLURM directives and MPI runtime pathstoolchain/templates/hipergator.mako
--cpus-per-task=3from default to GPU-only configuration--mem-per-cpu=50GBmemory allocation for GPU tasksmodules
Upgrade to NVHPC 25.9 and CUDA 12.9.1 stacktoolchain/modules
CodeAnt-AI Description
Configure HiperGator job scripts and module defaults for NVHPC/CUDA GPU runs
What Changed
Impact
✅ Lower CPU-per-GPU allocation for GPU jobs✅ More memory reserved per CPU for GPU runs✅ Consistent NVHPC/CUDA runtime and MPI launcher for GPU workflows💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.