Fix 8 HPC-sensitive bugs: GPU kernels, MPI broadcast, domain decomposition by sbryngelson · Pull Request #1242 · MFlowCode/MFC

sbryngelson · 2026-02-22T21:02:35Z

User description

Summary

Batches 8 bug fixes that touch GPU parallel regions, MPI broadcasts, or compiler-sensitive code paths. These need HPC CI validation across NVHPC, CCE, and AMD compilers.

Not included: #1225 (3D viscous GPU private clauses) and #1181 (MUSCL-THINC right-state + GPU privates) remain as separate PRs due to higher risk — they add new private clause variables to GPU parallel loops, which is the most compiler-sensitive change possible.

GPU kernel fixes

Fix z-boundary backward FD stencil (#1173)

File: src/common/m_finite_differences.fpp
Bug: At the z-domain boundary, the backward FD stencil reads fields(2) (y-velocity) instead of fields(3) (z-velocity), corrupting divergence/gradient calculations for all 3D viscous, hyperelastic, and hypoelastic simulations.
Fix: fields(2)%sf(x, y, z-2) → fields(3)%sf(x, y, z-2)
Impact: Runs inside GPU_PARALLEL_LOOP in m_viscous.fpp, m_hypoelastic.fpp, m_hyperelastic.fpp, and m_derived_variables.fpp. Affects every 3D simulation using these modules.

Fix GRCBC subsonic inflow using wrong L index (#1224)

File: src/simulation/m_cbc.fpp
Bug: In the GRCBC characteristic wave amplitude loop do i = 2, momxb, the code writes L(2) = ... instead of L(i) = .... Only the last species' wave amplitude survives; all others are overwritten.
Fix: L(2) → L(i)
Impact: Inside a GPU_PARALLEL_LOOP (collapse=2 CBC loop with L as private). Affects multi-species GRCBC subsonic inflow boundary conditions.

Fix G_K exponential degradation in damage model (#1227)

File: src/common/m_variables_conversion.fpp
Bug: G_K = G_K * max((1-damage), 0) is inside the do i = strxb, strxe stress component loop, applying the damage factor N times (exponential (1-damage)^N) instead of once.
Fix: Move the damage line before the loop in both s_convert_conservative_to_primitive and s_convert_primitive_to_conservative.
Impact: Inside GPU_PARALLEL_LOOP(collapse=3) with G_K as private. Produces exponentially wrong stiffness degradation for multi-component stress tensors.

MPI broadcast fixes

Fix bc_x%ve3 never MPI-broadcast (#1175)

File: src/simulation/m_mpi_proxy.fpp
Bug: A fypp loop broadcasts bc_x%ve2 twice and bc_x%ve3 (z-velocity component) never. All non-root ranks receive uninitialized garbage, which is then pushed to GPU via GPU_UPDATE.
Fix: Change the duplicate 'bc_x%ve2' entry to 'bc_x%ve3' in the fypp broadcast list.
Impact: Every multi-rank 3D simulation using velocity boundary conditions on x-boundaries reads garbage ve3 on non-root ranks.

Fix fluid_rho broadcast using MPI_LOGICAL instead of mpi_p (#1176)

File: src/pre_process/m_mpi_proxy.fpp
Bug: MPI_BCAST(fluid_rho(1), num_fluids_max, MPI_LOGICAL, ...) broadcasts a real(wp) array with the wrong MPI datatype. On 64-bit wp, MPI_LOGICAL (typically 4 bytes) only transfers half the bytes, silently corrupting density values on non-root ranks.
Fix: MPI_LOGICAL → mpi_p
Impact: MPI-implementation-dependent. May appear to work with some stacks (if byte layout aligns) and produce garbage with others. Affects pre-process perturbation IC density initialization.

MPI / compiler-sensitive fixes

Fix loc_violations used uninitialized in MPI_Allreduce (#1186)

File: src/pre_process/m_data_output.fpp
Bug: loc_violations is passed to s_mpi_allreduce_sum without initialization. Whether this matters depends on the compiler: GCC may zero-initialize stack variables in debug mode, NVHPC and CCE may not.
Fix: Initialize to 0 and restructure as a local variable (not module-level, avoiding implicit SAVE).
Impact: Pre-process grid validation. Compiler-dependent behavior makes HPC testing essential.

Fix domain decomposition overwriting muscl_order (#1229)

File: src/common/m_mpi_common.fpp
Bug: An else branch in s_mpi_decompose_computational_domain unconditionally sets recon_order = weno_order for all non-IGR cases, even when recon_type == MUSCL_TYPE where muscl_order was already correctly assigned.
Fix: Remove the else branch (2 lines deleted).
Impact: Wrong reconstruction order causes incorrect ghost-cell counts in domain decomposition for MUSCL cases. Needs multi-rank MPI validation.

Fix NaN check reading wrong index in MPI unpack (#1231)

File: src/common/m_mpi_common.fpp
Bug: ieee_is_nan(q_comm(i)%sf(j, k, l)) should be q_comm(i)%sf(j + unpack_offset, k, l) — the check reads a stale element instead of the just-unpacked one.
Fix: Add unpack_offset to the appropriate index in all 3 directions (x, y, z).
Impact: Only compiles under #if defined(__INTEL_COMPILER). Dead code on NVHPC/CCE/GCC, but the diagnostic would miss real NaNs in ghost cells on Intel builds.

Supersedes

Closes #1173, closes #1175, closes #1176, closes #1186, closes #1224, closes #1227, closes #1229, closes #1231

Test plan

All GitHub CI checks pass
HPC (NVHPC): 3D viscous case, GRCBC multi-species case, damage model case
HPC (CCE/AMD): Multi-rank 3D with velocity BCs (validates bc_x%ve3 broadcast)
HPC (any): MUSCL multi-rank case (validates domain decomposition fix)
Verify no regressions in existing test suite

🤖 Generated with Claude Code

CodeAnt-AI Description

Fix multiple HPC-sensitive bugs that caused corrupted 3D results and incorrect MPI communication

What Changed

Corrected the z-boundary finite-difference term so the z-velocity is used, preventing corrupted divergence/gradient in 3D viscous and elastic simulations
Restored correct MPI broadcasts and reductions so quantities are consistent across ranks: x-boundary third velocity is broadcast, fluid densities are broadcast with the correct numeric type, and local violation counters are zeroed before global reduction
Fixed out-of-bounds/incorrect-index NaN checks in MPI receives so received data is validated correctly
Adjusted damage-modifier order so damage properly reduces shear/elastic moduli when damage is active

Impact

✅ Fewer incorrect 3D divergence/gradient results
✅ Consistent boundary velocity and density values across MPI ranks
✅ Fewer spurious downsample warnings and reduction-related errors

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

The z-direction upper-boundary backward difference at iz_s%end uses fields(2) (y-component) instead of fields(3) (z-component) in the third term, corrupting the divergence in all 3D simulations using this finite difference routine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The bc_x velocity-end broadcast list has bc_x%ve2 duplicated where bc_x%ve3 should be. bc_y and bc_z rows are correct. Non-root ranks get uninitialized bc_x%ve3 in multi-rank 3D runs with x-boundary velocity BCs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fluid_rho is a real(wp) array but is broadcast with MPI_LOGICAL type, silently corrupting reference densities via bit reinterpretation on non-root ranks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

loc_violations is never set to 0 before the conditional that may or may not assign it. Non-violating ranks sum garbage in the reduction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Initializing a local variable in its declaration gives it the SAVE attribute in Fortran, meaning it would not reset to zero on subsequent calls. Move the initialization to an executable assignment so the variable is properly zeroed each time the subroutine is entered. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

In the GRCBC subsonic inflow loop (do i = 2, momxb), L(2) was hardcoded instead of L(i), causing only the second wave amplitude to be updated rather than each wave amplitude in the loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The damage factor was applied inside the stress component loop, causing G_K (and G) to be multiplied by the damage factor on every iteration. With N stress components, the effective shear modulus was reduced by damage^N instead of damage^1. Move the damage application before the loop so it is applied exactly once per cell. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The IGR conditional block unconditionally reset recon_order to weno_order in its else branch, overwriting the muscl_order that was correctly set by the recon_type check above. Remove the else branch so the original recon_order is preserved when IGR is inactive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The NaN diagnostic check used q_comm(i)%sf(j, k, l) but the value was unpacked into q_comm(i)%sf(j + unpack_offset, k, l). This meant the check was reading a stale or unrelated cell instead of the just- received value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeant-ai · 2026-02-22T21:02:40Z

CodeAnt AI is reviewing your PR.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

coderabbitai · 2026-02-22T21:02:45Z

Warning

Rate limit exceeded

@sbryngelson has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 14 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cubic-dev-ai

No issues found across 7 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

codeant-ai · 2026-02-22T21:05:09Z

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review Potential OOB (L array) The new GRCBC inflow loop assigns to `L(i)` for i=2..momxb and later accesses L at indices like `momxb+1`, `momxb+2`, and `advxe`. `L` is declared conditionally as `dimension(sys_size)` or `dimension(20)` (for AMD). Verify that in all build/config combinations `L` has sufficient length (>= max(advxe, momxb+2)) to avoid out-of-bounds writes/reads when the GRCBC branch is enabled. Also ensure GPU/private usage of `L` matches the declared size (private arrays in GPU regions must be properly sized). MPI count mismatch The broadcast of `fluid_rho` uses `num_fluids_max` as the count. If the actual number of fluids is `num_fluids` (broadcast earlier), broadcasting the full max length may send or expect uninitialised padding. Verify intended behaviour and consider using `num_fluids` to limit the count to active entries. Boundary stencil safety The z-boundary backward finite-difference stencil was corrected to reference `fields(3)` (z-velocity). Verify that the index `z-2` is always valid for the code path that runs this branch (no out-of-bounds reads). Confirm the boundary indexes and halo/padded ranges used by the GPU kernel guarantee `z-2` is in-bounds in all configurations (including small local subdomains). MPI broadcast list consistency The MPI broadcast list was corrected to include distinct `bc_x%vb`/`bc_x%ve` members (previously there was a duplicate). Confirm that the newly added members exist on the derived types, that their Fortran types match the MPI datatype `mpi_p` used in the call, and that the ordering/number of MPI_BCAST calls matches the receiving side expectations. A mismatch in types/counts or a missing member can lead to silent data corruption across ranks. MPI datatype correctness The call uses `mpi_p` as the MPI datatype token for `fluid_rho`. Confirm `mpi_p` matches the Fortran kind `real(wp)` (e.g. MPI_DOUBLE_PRECISION or an appropriate alias). A mismatch between `mpi_p` and the variable kind can lead to incorrect data transfer across MPI implementations or compilers.

codeant-ai · 2026-02-22T21:05:12Z

CodeAnt AI finished reviewing your PR.

Copilot

Pull request overview

This PR batches several correctness fixes in GPU-executed kernels and MPI communication paths (broadcasts + halo exchange diagnostics), plus a domain-decomposition bug affecting reconstruction order—aimed at eliminating silent corruption and compiler/MPI-stack-dependent behavior in HPC runs.

Changes:

Fix multiple GPU-path physics/kernel correctness issues (FD stencil component, GRCBC wave amplitudes indexing, damage-model shear modulus degradation).
Fix MPI communication correctness (missing broadcast field, wrong MPI datatype, Intel-only NaN check indexing) and a domain decomposition reconstruction-order overwrite.
Fix compiler-sensitive undefined behavior in pre_process by initializing a reduction input.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`src/common/m_finite_differences.fpp`	Corrects z-boundary backward FD stencil to use `fields(3)` (z-velocity) consistently.
`src/simulation/m_cbc.fpp`	Fixes GRCBC subsonic inflow loop to write `L(i)` instead of overwriting `L(2)`.
`src/common/m_variables_conversion.fpp`	Applies continuous damage factor to `G_K`/`G` once (outside the stress-component loop) to avoid exponential compounding.
`src/simulation/m_mpi_proxy.fpp`	Fixes user-input broadcast list to include `bc_x%ve3` (was duplicating `bc_x%ve2`).
`src/pre_process/m_mpi_proxy.fpp`	Broadcasts `fluid_rho` with the correct MPI real datatype (`mpi_p`) instead of `MPI_LOGICAL`.
`src/pre_process/m_data_output.fpp`	Initializes `loc_violations` before MPI reduction to avoid undefined values on some compilers.
`src/common/m_mpi_common.fpp`	Fixes Intel-only NaN check to read the just-unpacked element and prevents non-IGR runs from clobbering MUSCL recon order.

github-actions · 2026-02-22T23:26:23Z

Claude Code Review

No issues found. Checked for bugs and CLAUDE.md compliance.

sbryngelson and others added 9 commits February 22, 2026 16:01

Fix fluid_rho broadcast using MPI_LOGICAL instead of mpi_p

d9532d3

fluid_rho is a real(wp) array but is broadcast with MPI_LOGICAL type, silently corrupting reference densities via bit reinterpretation on non-root ranks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix loc_violations used uninitialized in MPI_Allreduce

45e9c1b

loc_violations is never set to 0 before the conditional that may or may not assign it. Non-violating ranks sum garbage in the reduction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings February 22, 2026 21:02

Copilot started reviewing on behalf of sbryngelson February 22, 2026 21:03 View session

codeant-ai bot added the size:S This PR changes 10-29 lines, ignoring generated files label Feb 22, 2026

cubic-dev-ai bot reviewed Feb 22, 2026

View reviewed changes

Copilot AI reviewed Feb 22, 2026

View reviewed changes

Merge branch 'master' into fix/hpc-bugfixes-batch

d2348e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix 8 HPC-sensitive bugs: GPU kernels, MPI broadcast, domain decomposition#1242

Fix 8 HPC-sensitive bugs: GPU kernels, MPI broadcast, domain decomposition#1242
sbryngelson wants to merge 10 commits intoMFlowCode:masterfrom
sbryngelson:fix/hpc-bugfixes-batch

sbryngelson commented Feb 22, 2026 •

edited by codeant-ai bot

Loading

Uh oh!

codeant-ai bot commented Feb 22, 2026

Uh oh!

coderabbitai bot commented Feb 22, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

codeant-ai bot commented Feb 22, 2026

Uh oh!

codeant-ai bot commented Feb 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions bot commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

sbryngelson commented Feb 22, 2026 • edited by codeant-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

GPU kernel fixes

MPI broadcast fixes

MPI / compiler-sensitive fixes

Supersedes

Test plan

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Uh oh!

codeant-ai bot commented Feb 22, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

coderabbitai bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codeant-ai bot commented Feb 22, 2026

Nitpicks 🔍

Uh oh!

codeant-ai bot commented Feb 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions bot commented Feb 22, 2026

Claude Code Review

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

sbryngelson commented Feb 22, 2026 •

edited by codeant-ai bot

Loading

coderabbitai bot commented Feb 22, 2026 •

edited

Loading