A few years ago the question was whether to let AI write code at all. That debate is over. The question now, the one engineering teams are actually arguing about, is how to review the flood of pull requests that AI is producing. A top thread on r/ClaudeAI this spring, "Reviewing AI-generated pull requests in 2026," drew thousands of upvotes precisely because every team is hitting the same wall: the code arrives faster than anyone can responsibly review it.
The danger is not that AI writes bad code. It is that AI writes code that looks good. It compiles, it matches your style, it reads cleanly, and it lulls the reviewer into approving on vibes. The bug is three layers down in logic that is plausible and wrong.
TL;DR: AI coding tools now produce a large share of the pull requests engineering teams review, and the reviewer, not the tool, owns whatever merges. The fix is a short policy plus a review checklist: require disclosure that AI wrote the change, keep a named human accountable for every merge, demand tests the author actually understands, and review AI PRs harder for silent failures, plausible-but-wrong logic, hallucinated APIs, and security gaps that look fine at a glance.
Why AI pull requests need their own review rules
Normal code review assumes a human wrote the code, which means the author understands it and can defend each decision. That assumption quietly breaks when the author pasted the change from an assistant.
When an AI writes the change, three things shift. The author may not fully understand what they are submitting, so "the author will catch their own mistakes" no longer holds. The volume goes up, because generating a PR is now faster than reviewing one, which inverts the bottleneck and tempts teams to rubber-stamp. And the failure modes change, AI does not make the typos and obvious mistakes a junior makes; it makes confident, well-formatted errors that survive a casual read.
This is the same dynamic that pushed arXiv to ban papers with unchecked LLM-generated errors like hallucinated references in May 2026: AI output is fluent enough to pass a surface check and wrong often enough to cause real damage. Code is worse than papers here, because a plausible-looking bug ships to production and runs.
The answer is not to slow down or ban AI assistance. It is to make the review process match the new author. That takes two artifacts: a short policy that fixes accountability, and a checklist that tells reviewers what to actually look at.
The policy: who owns an AI-generated merge
The foundation of the whole thing is one sentence: the human who approves a pull request owns the outcome, no matter who or what wrote the code.
This matters because AI cannot be accountable. It is not a professional actor, it does not carry liability, and "the model wrote it" is not a defense that survives an incident review. If accountability is not pinned to a named person, it evaporates, and a team with no accountable approver will merge things nobody actually vouched for.
The second pillar is disclosure. The PR author states whether the change was substantially AI-generated. This is not about blame, it is about routing attention, a reviewer who knows AI wrote a change reviews it for the failure modes AI produces. A checkbox in your PR template does the job.
The third pillar is the explainability rule: the author must be able to explain the code they are submitting. If they cannot describe what a function does and why, the PR is not ready, regardless of whether it works. This single rule prevents the worst pattern, submitting generated code the author has not read.
The 9-point AI pull request review checklist
Run this on any PR flagged as AI-generated. The first three are non-negotiable; the rest are where AI code actually fails.
- Accountability is clear. A named human is the approver and understands they own the merge.
- AI authorship is disclosed. The PR states the change was AI-generated, so reviewers calibrate accordingly.
- The author can explain it. Ask one "why" question about a non-obvious line. If the answer is "the AI did it," send it back.
- APIs and libraries actually exist. AI hallucinates plausible function names, parameters, and packages. Confirm every unfamiliar call against real documentation, and check that any new dependency is real and maintained.
- The logic is correct, not just plausible. Trace the actual behavior on a real input, including the edge cases. AI code is strongest at looking right and weakest at being right on the boundaries.
- Tests verify behavior, not just pass. AI often writes tests that assert what the code does rather than what it should do. Check that a test would fail if the logic were wrong, not just that it is green.
- Security checks are effective, not decorative. AI will add an auth check or input validation that is present but bypassable. Review anything touching authentication, authorization, data access, or external calls as if it were unreviewed.
- No secrets, no licensed code copied verbatim. Scan for hardcoded keys and for large blocks that may be reproduced from training data without a compatible license.
- Scope matches the task. AI frequently does more than asked, refactoring unrelated code or "improving" things nobody requested. Extra changes are extra risk; require them to be justified or removed.
Where AI-generated code fails review most
| Failure mode | Why it slips through | What to do |
|---|---|---|
| Hallucinated API or package | Name looks real, code compiles against a stub | Verify against official docs; check the dependency exists |
| Plausible-but-wrong logic | Reads fluently, fails on edge cases | Trace a real input by hand, test boundaries |
| Decorative security check | An auth or validation line is present but ineffective | Review security-touching code as if unreviewed |
| Tests that only pass | Assertions match the code, not the requirement | Confirm a test fails when logic is broken |
| Scope creep | "Helpful" refactors bundled into the PR | Require unrelated changes to be split out |
| Confident comment, wrong code | The comment describes intended behavior, code does something else | Trust the code, not the comment |
The pattern across all six is the same: AI optimizes for looking correct. Your review has to optimize for being correct, which means spending your attention on substance and treating fluency as a reason for more scrutiny, not less.
Keeping AI review fast without rubber-stamping
The objection every engineering lead raises is real: if AI triples the volume of pull requests, a deeper review process is exactly what you do not have time for. The answer is not to review less. It is to make the cheap PRs cheap and spend the saved time on the risky ones.
Four tactics keep the throughput up. Require small PRs, an AI can generate a 1,000-line change as easily as a 50-line one, but only the small one can be reviewed properly, so cap PR size and make the author split big changes. Make the author self-review first, the person submitting runs the 9-point checklist before requesting review, which catches the obvious failures without a second person. Reserve deep review for risk, a change to a README or a test fixture does not need the same scrutiny as a change to authentication, so triage by what the code touches, not by who wrote it. And let AI review tools do the first pass, they are good at flagging style issues, missing tests, and obvious bugs, which frees the human reviewer to focus on logic and security.
The goal is to spend your limited human attention where AI fails hardest: correctness on edge cases and security. Everything else can be batched, automated, or pushed back to the author. A team that triages this way reviews more code more carefully than one that tries to read every line of every AI PR at the same depth and quietly gives up.
Copy-paste AI code review policy
Adapt the bracketed fields. One page is enough for a team under 50 engineers.
[Company Name] AI-Generated Code Review Policy
Accountability. The engineer who approves and merges a pull request owns its outcome, regardless of whether the code was written by a person or an AI tool.
Disclosure. Authors must indicate when a pull request is substantially AI-generated, using the [checkbox / field] in the PR template.
Explainability. An author must be able to explain any code they submit. Code the author cannot explain is not ready to merge.
Review standard. AI-generated pull requests are reviewed against the 9-point checklist, with mandatory scrutiny of any change touching authentication, authorization, data access, or external services.
Tests. Every AI-generated change includes tests that verify intended behavior, not tests that merely pass. The reviewer confirms a test would fail if the logic were wrong.
AI review tools. Automated AI review may be used as a first pass but does not satisfy the human accountability requirement. A named human remains the approver.
Keep it next to your AI acceptable use policy. If your team is still choosing how to govern the coding tools themselves, the AI coding tools governance policy covers tool selection, data handling, and access, and the cost control guide covers the budget side.
Checklist (copy/paste)
- Policy states the approving human owns every merge
- PR template has an "AI-generated" disclosure field
- Authors must be able to explain submitted code
- 9-point review checklist adopted for AI PRs
- Security-touching AI code gets mandatory deeper review
- Tests are checked for verification, not just passing
- Dependency and API existence verified on unfamiliar calls
- AI review tools used as first pass only, not as the approver
Where this fits in your governance
Reviewing AI-generated code is the quality-control half of AI coding governance. The tools producing these PRs are the same ones in your AI coding tools policy and your AI tool register, and the security failures the checklist catches are the same class of risk covered in the TypeScript AI agent security playbook. The policy here is small on purpose. Its whole job is to make sure that however much code AI writes, a human still stands behind every line that ships.
