[CI] Make premerge advisor exit with code 0 if failures are explained #172394

boomanaiden154 · 2025-12-16T00:47:55Z

This will mark the CI as green if the premerge advisor is able to
explain all of the failures. We have seen many false negatives (failures
not explained that should be), but no false positives (failures
explained that should not be) so far.

Created using spr 1.3.7

llvmbot · 2025-12-16T00:48:27Z

@llvm/pr-subscribers-infrastructure

Author: Aiden Grossman (boomanaiden154)

Changes

This will mark the CI as green if the premerge advisor is able to
explain all of the failures. We have seen many false negatives (failures
not explained that should be), but no false positives (failures
explained that should not be) so far.

Full diff: https://github.com/llvm/llvm-project/pull/172394.diff

3 Files Affected:

(modified) .ci/generate_test_report_lib.py (+27-3)
(modified) .ci/generate_test_report_lib_test.py (+52-24)
(modified) .ci/premerge_advisor_explain.py (+13-14)

diff --git a/.ci/generate_test_report_lib.py b/.ci/generate_test_report_lib.py
index 5edde254eb73d..734ce3a4f4316 100644
--- a/.ci/generate_test_report_lib.py
+++ b/.ci/generate_test_report_lib.py
@@ -158,6 +158,16 @@ def get_failures(junit_objects) -> dict[str, list[tuple[str, str]]]:
     return failures
 
 
+def are_all_failures_explained(
+    failures: list[tuple[str, str]], failure_explanations: dict[str, FailureExplanation]
+) -> bool:
+    for failure in failures:
+        failed_action, _ = failure
+        if failed_action not in failure_explanations:
+            return False
+    return True
+
+
 # Set size_limit to limit the byte size of the report. The default is 1MB as this
 # is the most that can be put into an annotation. If the generated report exceeds
 # this limit and failures are listed, it will be generated again without failures
@@ -172,7 +182,7 @@ def generate_report(
     size_limit=1024 * 1024,
     list_failures=True,
     failure_explanations_list: list[FailureExplanation] = [],
-):
+) -> tuple[str, bool]:
     failures = get_failures(junit_objects)
     tests_run = 0
     tests_skipped = 0
@@ -183,6 +193,12 @@ def generate_report(
         if not failure_explanation["explained"]:
             continue
         failure_explanations[failure_explanation["name"]] = failure_explanation
+    all_failures_explained = True
+    if failures:
+        for _, failures_list in failures.items():
+            all_failures_explained &= are_all_failures_explained(
+                failures_list, failure_explanations
+            )
 
     for results in junit_objects:
         for testsuite in results:
@@ -202,7 +218,11 @@ def generate_report(
             )
         else:
             ninja_failures = find_failure_in_ninja_logs(ninja_logs)
+            all_failures_explained &= are_all_failures_explained(
+                ninja_failures, failure_explanations
+            )
             if not ninja_failures:
+                all_failures_explained = False
                 report.extend(
                     [
                         "The build failed before running any tests. Detailed "
@@ -229,7 +249,7 @@ def generate_report(
                         UNRELATED_FAILURES_STR,
                     ]
                 )
-        return "\n".join(report)
+        return ("\n".join(report), all_failures_explained)
 
     tests_passed = tests_run - tests_skipped - tests_failed
 
@@ -263,7 +283,11 @@ def plural(num_tests):
         # No tests failed but the build was in a failed state. Bring this to the user's
         # attention.
         ninja_failures = find_failure_in_ninja_logs(ninja_logs)
+        all_failures_explained &= are_all_failures_explained(
+            ninja_failures, failure_explanations
+        )
         if not ninja_failures:
+            all_failures_explained = False
             report.extend(
                 [
                     "",
@@ -302,7 +326,7 @@ def plural(num_tests):
             list_failures=False,
         )
 
-    return report
+    return (report, all_failures_explained)
 
 
 def load_info_from_files(build_log_files):
diff --git a/.ci/generate_test_report_lib_test.py b/.ci/generate_test_report_lib_test.py
index 06279d672f3c3..bd72a7b2314d5 100644
--- a/.ci/generate_test_report_lib_test.py
+++ b/.ci/generate_test_report_lib_test.py
@@ -191,19 +191,23 @@ def test_ninja_log_mismatched_failed(self):
     def test_title_only(self):
         self.assertEqual(
             generate_test_report_lib.generate_report("Foo", 0, [], []),
-            dedent(
-                """\
+            (
+                dedent(
+                    """\
                 # Foo
 
                 :white_check_mark: The build succeeded and no tests ran. This is expected in some build configurations."""
+                ),
+                True,
             ),
         )
 
     def test_title_only_failure(self):
         self.assertEqual(
             generate_test_report_lib.generate_report("Foo", 1, [], []),
-            dedent(
-                """\
+            (
+                dedent(
+                    """\
             # Foo
 
             The build failed before running any tests. Detailed information about the build failure could not be automatically obtained.
@@ -211,6 +215,8 @@ def test_title_only_failure(self):
             Download the build's log file to see the details.
 
             If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
+                ),
+                False,
             ),
         )
 
@@ -233,8 +239,9 @@ def test_title_only_failure_ninja_log(self):
                     ]
                 ],
             ),
-            dedent(
-                """\
+            (
+                dedent(
+                    """\
             # Foo
 
             The build failed before running any tests. Click on a failure below to see the details.
@@ -250,6 +257,8 @@ def test_title_only_failure_ninja_log(self):
             </details>
             
             If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
+                ),
+                False,
             ),
         )
 
@@ -272,8 +281,9 @@ def test_no_tests_in_testsuite(self):
                 ],
                 [],
             ),
-            dedent(
-                """\
+            (
+                dedent(
+                    """\
                 # Foo
 
                 The build failed before running any tests. Detailed information about the build failure could not be automatically obtained.
@@ -281,6 +291,8 @@ def test_no_tests_in_testsuite(self):
                 Download the build's log file to see the details.
 
                 If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
+                ),
+                False,
             ),
         )
 
@@ -312,7 +324,8 @@ def test_no_failures(self):
               * 1 test passed
               
               :white_check_mark: The build succeeded and all tests passed."""
-                )
+                ),
+                True,
             ),
         )
 
@@ -348,7 +361,8 @@ def test_no_failures_build_failed(self):
               Download the build's log file to see the details.
               
               If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
-                )
+                ),
+                False,
             ),
         )
 
@@ -403,7 +417,8 @@ def test_no_failures_build_failed_ninja_log(self):
                     </details>
 
                     If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
-                )
+                ),
+                False,
             ),
         )
 
@@ -496,7 +511,8 @@ def test_no_failures_multiple_build_failed_ninja_log(self):
                     </details>
 
                     If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
-                )
+                ),
+                False,
             ),
         )
 
@@ -558,7 +574,8 @@ def test_report_single_file_single_testsuite(self):
           </details>
           
           If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
-                )
+                ),
+                False,
             ),
         )
 
@@ -625,7 +642,7 @@ def test_report_single_file_multiple_testsuites(self):
                 ],
                 [],
             ),
-            self.MULTI_SUITE_OUTPUT,
+            (self.MULTI_SUITE_OUTPUT, False),
         )
 
     def test_report_multiple_files_multiple_testsuites(self):
@@ -667,7 +684,7 @@ def test_report_multiple_files_multiple_testsuites(self):
                 ],
                 [],
             ),
-            self.MULTI_SUITE_OUTPUT,
+            (self.MULTI_SUITE_OUTPUT, False),
         )
 
     def test_report_dont_list_failures(self):
@@ -703,7 +720,8 @@ def test_report_dont_list_failures(self):
           Failed tests and their output was too large to report. Download the build's log file to see the details.
           
           If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
-                )
+                ),
+                False,
             ),
         )
 
@@ -740,7 +758,8 @@ def test_report_dont_list_failures_link_to_log(self):
           Failed tests and their output was too large to report. Download the build's log file to see the details.
           
           If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
-                )
+                ),
+                False,
             ),
         )
 
@@ -780,7 +799,8 @@ def test_report_size_limit(self):
           Failed tests and their output was too large to report. Download the build's log file to see the details.
           
           If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
-                )
+                ),
+                False,
             ),
         )
 
@@ -810,8 +830,9 @@ def test_report_ninja_explanation(self):
                     }
                 ],
             ),
-            dedent(
-                """\
+            (
+                dedent(
+                    """\
             # Foo
 
             The build failed before running any tests. Click on a failure below to see the details.
@@ -828,6 +849,8 @@ def test_report_ninja_explanation(self):
             </details>
             
             If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
+                ),
+                True,
             ),
         )
 
@@ -881,7 +904,8 @@ def test_report_test_failure_explanation(self):
           </details>
 
           If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
-                )
+                ),
+                True,
             ),
         )
 
@@ -934,7 +958,8 @@ def test_report_test_failure_have_explanation_explained_false(self):
           </details>
 
           If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
-                )
+                ),
+                False,
             ),
         )
 
@@ -972,8 +997,9 @@ def test_generate_report_end_to_end(self):
                 generate_test_report_lib.generate_report_from_files(
                     "Foo", 1, [junit_xml_file, ninja_log_file]
                 ),
-                dedent(
-                    """\
+                (
+                    dedent(
+                        """\
                     # Foo
 
                     * 1 test passed
@@ -991,5 +1017,7 @@ def test_generate_report_end_to_end(self):
                     </details>
 
                     If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the `infrastructure` label."""
+                    ),
+                    False,
                 ),
             )
diff --git a/.ci/premerge_advisor_explain.py b/.ci/premerge_advisor_explain.py
index 00df1b94c4578..3da3d579de221 100644
--- a/.ci/premerge_advisor_explain.py
+++ b/.ci/premerge_advisor_explain.py
@@ -54,7 +54,7 @@ def main(
     github_token: str,
     pr_number: int,
     return_code: int,
-):
+) -> bool:
     """The main entrypoint for the script.
 
     This function parses failures from files, requests information from the
@@ -113,19 +113,14 @@ def main(
             advisor_explanations = advisor_response.json()
         else:
             print(advisor_response.reason)
-    comments.append(
-        get_comment(
-            github_token,
-            pr_number,
-            generate_test_report_lib.generate_report(
-                generate_test_report_lib.compute_platform_title(),
-                return_code,
-                junit_objects,
-                ninja_logs,
-                failure_explanations_list=advisor_explanations,
-            ),
-        )
+    report, failures_explained = generate_test_report_lib.generate_report(
+        generate_test_report_lib.compute_platform_title(),
+        return_code,
+        junit_objects,
+        ninja_logs,
+        failure_explanations_list=advisor_explanations,
     )
+    comments.append(get_comment(github_token, pr_number, report))
     if return_code == 0 and "id" not in comments[0]:
         # If the job succeeds and there is not an existing comment, we
         # should not write one to reduce noise.
@@ -134,6 +129,7 @@ def main(
     with open(comments_file_name, "w") as comment_file_handle:
         json.dump(comments, comment_file_handle)
     print(f"Wrote comments to {comments_file_name}")
+    return failures_explained
 
 
 if __name__ == "__main__":
@@ -152,7 +148,7 @@ def main(
     if platform.machine() == "arm64" or platform.machine() == "aarch64":
         sys.exit(0)
 
-    main(
+    failures_explained = main(
         args.commit_sha,
         args.build_log_files,
         args.github_token,
@@ -160,4 +156,7 @@ def main(
         args.return_code,
     )
 
+    if failures_explained:
+        sys.exit(0)
+
     sys.exit(args.return_code)

Created using spr 1.3.7

DavidSpickett · 2025-12-16T10:19:40Z

.ci/generate_test_report_lib.py

+def are_all_failures_explained(
+    failures: list[tuple[str, str]], failure_explanations: dict[str, FailureExplanation]
+) -> bool:
+    for failure in failures:


You can do the unpacking here on the same line:

for failed_action, _ in failures:

DavidSpickett · 2025-12-16T10:26:31Z

.ci/generate_test_report_lib.py

+        all_failures_explained &= are_all_failures_explained(
+            ninja_failures, failure_explanations
+        )


This could go in the else block of if not ninja_failures. Then each arm of the if has 1 line setting of all_failures_explained.

DavidSpickett · 2025-12-16T10:34:20Z

.ci/generate_test_report_lib.py

+        failed_action, _ = failure
+        if failed_action not in failure_explanations:
+            return False
+    return True


I see that the type of an explanation is:

class FailureExplanation(TypedDict): name: str explained: bool reason: Optional[str]

So I'd have expect to see checks that:

the failure is a key in failure_explanations (which you've done)

the key explained is set to true

the reason key is set to some non-empty string (though I'm not sure what explained True and empty reason means, maybe this is fine)

If I haven't misinterpreted the type here, maybe you don't ever expect to be given FailureExplanation where explained is false?

If so I think here is a good opportunity to enforce that with an extra check that explained is true.

[𝘀𝗽𝗿] initial version

b77fd1e

Created using spr 1.3.7

llvmbot added the infrastructure Bugs about LLVM infrastructure label Dec 16, 2025

boomanaiden154 requested review from DavidSpickett and cmtice December 16, 2025 00:48

feedback

cd35cee

Created using spr 1.3.7

cmtice approved these changes Dec 16, 2025

View reviewed changes

DavidSpickett reviewed Dec 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Make premerge advisor exit with code 0 if failures are explained #172394

[CI] Make premerge advisor exit with code 0 if failures are explained #172394

boomanaiden154 commented Dec 16, 2025

Uh oh!

llvmbot commented Dec 16, 2025

Uh oh!

DavidSpickett Dec 16, 2025

Uh oh!

DavidSpickett Dec 16, 2025

Uh oh!

DavidSpickett Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[CI] Make premerge advisor exit with code 0 if failures are explained #172394

Are you sure you want to change the base?

[CI] Make premerge advisor exit with code 0 if failures are explained #172394

Conversation

boomanaiden154 commented Dec 16, 2025

Uh oh!

llvmbot commented Dec 16, 2025

Uh oh!

DavidSpickett Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

DavidSpickett Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

DavidSpickett Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants