From 7641f2fd3a9355c55842223b6d82e0dc73f020fe Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Thu, 22 Jan 2026 15:20:47 -0500 Subject: [PATCH 1/7] Draft section on AI contributions --- docs/source/developers/overview.rst | 35 +++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/docs/source/developers/overview.rst b/docs/source/developers/overview.rst index 7e38dcb8ebc..21d987221a6 100644 --- a/docs/source/developers/overview.rst +++ b/docs/source/developers/overview.rst @@ -146,6 +146,41 @@ will merge the pull request. This is done with a description, a link back to the pull request, and attribution to the contributor and any co-authors. +.. _ai-generated-code: + +AI-generated code ++++++++++++++++++ + +We recognise that AI coding assistants are now a regular part of many +developers' workflows and can improve productivity. Thoughtful use of these +tools can be beneficial, but AI-generated PRs can sometimes lead to +undesirable additional maintainer burden. Human-generated mistakes tend to +be easier to spot and reason about, and code review often feels like a +collaborative learning experience that benefits both submitter and +reviewer. When a PR appears to have been generated without much engagement +from the submitter, it can feel like work that the maintainer might as well +have done themselves. + +We are not opposed to the use of AI tools in generating PRs, but recommend +the following: + +* Only take on a PR if you are able to debug and own the changes yourself +* Make sure that the PR title and body match the style and length of others + in this repo +* Follow coding conventions used in the rest of the codebase +* Be upfront about AI usage and summarise what was AI-generated +* If there are parts you don't fully understand, add inline comments + explaining what steps you took to verify correctness, and reference any + sources that guided your changes (e.g. "took a similar approach to #123456") + +PR authors are also responsible for disclosing any copyrighted materials in +submitted contributions. See the `ASF's guidance on AI-generated code +`_ for further +information on licensing considerations. + +PRs that appear to be fully generated by AI with little to no engagement +from the author may be closed without further review. + .. Section on Experimental repositories: .. include:: experimental_repos.rst From e10fb807a21f709544bdc7453bedb2186ca520ab Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Thu, 22 Jan 2026 15:22:29 -0500 Subject: [PATCH 2/7] Add suggestions from GW --- docs/source/developers/overview.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/source/developers/overview.rst b/docs/source/developers/overview.rst index 21d987221a6..53f5989ff4c 100644 --- a/docs/source/developers/overview.rst +++ b/docs/source/developers/overview.rst @@ -165,6 +165,8 @@ We are not opposed to the use of AI tools in generating PRs, but recommend the following: * Only take on a PR if you are able to debug and own the changes yourself +* Review all lines of generated code before creating the PR to understand + every detail, just as if you had written it yourself * Make sure that the PR title and body match the style and length of others in this repo * Follow coding conventions used in the rest of the codebase @@ -172,6 +174,10 @@ the following: * If there are parts you don't fully understand, add inline comments explaining what steps you took to verify correctness, and reference any sources that guided your changes (e.g. "took a similar approach to #123456") +* AI tools are notorious for generating overly verbose comments, unnecessary + test cases, and fixing test failures using incorrect approaches - make sure + to check for and fix these issues +* Break down large PRs into smaller ones to make review easier PR authors are also responsible for disclosing any copyrighted materials in submitted contributions. See the `ASF's guidance on AI-generated code From 813ef6c5887de4cf2fc18667fca9a6c6266058dd Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Thu, 22 Jan 2026 15:24:01 -0500 Subject: [PATCH 3/7] Add RTC ideas --- docs/source/developers/overview.rst | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/source/developers/overview.rst b/docs/source/developers/overview.rst index 53f5989ff4c..8aa6b534c91 100644 --- a/docs/source/developers/overview.rst +++ b/docs/source/developers/overview.rst @@ -154,12 +154,15 @@ AI-generated code We recognise that AI coding assistants are now a regular part of many developers' workflows and can improve productivity. Thoughtful use of these tools can be beneficial, but AI-generated PRs can sometimes lead to -undesirable additional maintainer burden. Human-generated mistakes tend to -be easier to spot and reason about, and code review often feels like a -collaborative learning experience that benefits both submitter and -reviewer. When a PR appears to have been generated without much engagement -from the submitter, it can feel like work that the maintainer might as well -have done themselves. +undesirable additional maintainer burden. PRs that appear to be fully +generated by AI with little to no engagement from the author may be closed +without further review. + +Human-generated mistakes tend to be easier to spot and reason about, and +code review often feels like a collaborative learning experience that +benefits both submitter and reviewer. When a PR appears to have been +generated without much engagement from the submitter, it can feel like work +that the maintainer might as well have done themselves. We are not opposed to the use of AI tools in generating PRs, but recommend the following: @@ -184,9 +187,6 @@ submitted contributions. See the `ASF's guidance on AI-generated code `_ for further information on licensing considerations. -PRs that appear to be fully generated by AI with little to no engagement -from the author may be closed without further review. - .. Section on Experimental repositories: .. include:: experimental_repos.rst From 3869b14d48eb17a616bf71f0cf444d3692fe5caa Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Thu, 22 Jan 2026 15:34:04 -0500 Subject: [PATCH 4/7] Make guidance more concise and add reference to new contributors guide --- docs/source/developers/guide/index.rst | 4 +++- docs/source/developers/overview.rst | 24 ++++++++++-------------- 2 files changed, 13 insertions(+), 15 deletions(-) diff --git a/docs/source/developers/guide/index.rst b/docs/source/developers/guide/index.rst index 0ed27a0ddc5..c8d3103ca78 100644 --- a/docs/source/developers/guide/index.rst +++ b/docs/source/developers/guide/index.rst @@ -141,7 +141,9 @@ of adding a basic feature. #. **Push the branch on your fork and create a Pull Request** - See detailed instructions on :ref:`create_pr` + See detailed instructions on :ref:`create_pr`. If you have used AI tools + to help generate your contribution, please also read our guidance on + :ref:`ai-generated-code`. If you are ready you can start with building Arrow or choose to follow diff --git a/docs/source/developers/overview.rst b/docs/source/developers/overview.rst index 8aa6b534c91..ec948b18825 100644 --- a/docs/source/developers/overview.rst +++ b/docs/source/developers/overview.rst @@ -161,25 +161,21 @@ without further review. Human-generated mistakes tend to be easier to spot and reason about, and code review often feels like a collaborative learning experience that benefits both submitter and reviewer. When a PR appears to have been -generated without much engagement from the submitter, it can feel like work -that the maintainer might as well have done themselves. +generated without much engagement from the submitter, reviewers with access +to AI tools could more efficiently generate the code directly. We are not opposed to the use of AI tools in generating PRs, but recommend the following: -* Only take on a PR if you are able to debug and own the changes yourself -* Review all lines of generated code before creating the PR to understand - every detail, just as if you had written it yourself -* Make sure that the PR title and body match the style and length of others - in this repo -* Follow coding conventions used in the rest of the codebase +* Only take on a PR if you are able to debug and own the changes yourself - + review all generated code to understand every detail +* Match the style and conventions used in the rest of the codebase, including + PR titles and descriptions * Be upfront about AI usage and summarise what was AI-generated -* If there are parts you don't fully understand, add inline comments - explaining what steps you took to verify correctness, and reference any - sources that guided your changes (e.g. "took a similar approach to #123456") -* AI tools are notorious for generating overly verbose comments, unnecessary - test cases, and fixing test failures using incorrect approaches - make sure - to check for and fix these issues +* If there are parts you don't fully understand, leave comments on your own PR + explaining what steps you took to verify correctness +* Watch for AI's tendency to generate overly verbose comments, unnecessary + test cases, and incorrect fixes * Break down large PRs into smaller ones to make review easier PR authors are also responsible for disclosing any copyrighted materials in From 8978acf016513e5ec804b5da7551d6ae3fbc1a5e Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Fri, 23 Jan 2026 07:52:28 -0500 Subject: [PATCH 5/7] Update docs/source/developers/overview.rst Co-authored-by: Andrew Lamb --- docs/source/developers/overview.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/developers/overview.rst b/docs/source/developers/overview.rst index ec948b18825..b057c70343e 100644 --- a/docs/source/developers/overview.rst +++ b/docs/source/developers/overview.rst @@ -159,7 +159,7 @@ generated by AI with little to no engagement from the author may be closed without further review. Human-generated mistakes tend to be easier to spot and reason about, and -code review often feels like a collaborative learning experience that +code review is intended to be a collaborative learning experience that benefits both submitter and reviewer. When a PR appears to have been generated without much engagement from the submitter, reviewers with access to AI tools could more efficiently generate the code directly. From ad0bc89174a91451dc2ff3c788e6314320bf856e Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Fri, 23 Jan 2026 07:53:11 -0500 Subject: [PATCH 6/7] Update docs/source/developers/overview.rst --- docs/source/developers/overview.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/developers/overview.rst b/docs/source/developers/overview.rst index b057c70343e..3140618f207 100644 --- a/docs/source/developers/overview.rst +++ b/docs/source/developers/overview.rst @@ -167,7 +167,7 @@ to AI tools could more efficiently generate the code directly. We are not opposed to the use of AI tools in generating PRs, but recommend the following: -* Only take on a PR if you are able to debug and own the changes yourself - +* Only submit a PR if you are able to debug and own the changes yourself - review all generated code to understand every detail * Match the style and conventions used in the rest of the codebase, including PR titles and descriptions From 55a02bc5a1f7699ce5a0da4ebb12da913d34e8f0 Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Fri, 23 Jan 2026 11:43:01 -0500 Subject: [PATCH 7/7] Update docs/source/developers/overview.rst Co-authored-by: Rok Mihevc --- docs/source/developers/overview.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/source/developers/overview.rst b/docs/source/developers/overview.rst index 3140618f207..a6445aaccde 100644 --- a/docs/source/developers/overview.rst +++ b/docs/source/developers/overview.rst @@ -162,7 +162,9 @@ Human-generated mistakes tend to be easier to spot and reason about, and code review is intended to be a collaborative learning experience that benefits both submitter and reviewer. When a PR appears to have been generated without much engagement from the submitter, reviewers with access -to AI tools could more efficiently generate the code directly. +to AI tools could more efficiently generate the code directly, and since +the submitter is not likely to learn from the review process, their time is +more productively spent researching and reporting on the issue. We are not opposed to the use of AI tools in generating PRs, but recommend the following: