RDEL #92: How do AI code reviews impact engineering teams?

Automated reviews improved code quality, but didn’t reduce human effort or accelerate delivery.

May 13, 2025

Welcome back to Research-Driven Engineering Leadership. Each week, we pose an interesting topic in engineering leadership, and apply the latest research in the field to drive to an answer.

Code review plays a central role in engineering teams—not just as a final check for bugs, but as a key mechanism for knowledge sharing, design validation, and quality control. As AI tools take on more responsibility across the software lifecycle, automated code review promises to reduce toil, catch bugs faster, and standardize quality. The question is, does it? This week we ask: how do AI code reviews impact engineering teams and the software development lifecycle?

The context

Code review plays a central role in modern software development—not just as a quality gate, but as a practice that reinforces shared standards, facilitates knowledge transfer, and builds team alignment. It’s one of the few engineering rituals that touches nearly every change before it ships.

But as teams scale and complexity grows, maintaining a high-quality, timely review process becomes increasingly challenging. Review queues can grow long, feedback can be inconsistent, and the cost of missed issues—or delayed approvals—adds up. To address these challenges, many teams are turning to automated code review tools powered by large language models (LLMs).

These tools promise to supplement human reviews with fast, consistent feedback—flagging bugs, enforcing conventions, and even surfacing missed edge cases. But the key question remains: do they actually improve the development workflow, or simply shift effort from one part of the process to another?

The research

Researchers conducted a study inside the software division of Beko, a multinational electronics company. Over 10 months, 238 developers used an AI-based code review tool powered by GPT-4 (via Qodo PR-Agent). The study analyzed 4,335 pull requests across three projects, comparing automated vs. manual review patterns, survey feedback, and developer behavior.

Key findings:

73.8% of automated review comments were implemented by developers.
This suggests that most comments were seen as useful and actionable. As one developer noted, “It makes finding code defects more easy. Developers can see their mistakes fast.”
Pull request closure time increased from 5h 52m to 8h 20m on average.
While the tool improved quality, it also introduced new review tasks. Projects showed varied results—some faster, some slower—indicating tool effectiveness may depend on team habits and workflows
Pull request closure durations before and after CodeReviewBot
Automated reviews did not significantly reduce human review comments.
Human reviewer comments dropped only slightly (from 0.31 to 0.28 per PR), suggesting that automation is additive, not a replacement.
Developers perceived a minor improvement in code quality.
In surveys, 68.8% of respondents said the tool improved code quality, particularly by catching bugs, typos, and missing tests.
26.2% of automated comments were labeled as “Won’t Fix” or “Closed.”
These false positives created extra overhead, with some developers noting the tool occasionally suggested irrelevant or misleading changes.

The application

This study shows that LLM-based code review tools can add value, but are still imperfect. Developers acted on nearly three-quarters of AI-generated comments, and many reported modest improvements in code quality. But speed gains were inconsistent, and tools sometimes introduced low-value or irrelevant feedback. These results suggest that automated reviews can be a valuable complement to human review—but they’re not yet a replacement. As the underlying models and interfaces continue to mature, these tools will likely become more accurate, more context-aware, and better aligned with team workflows.

To apply these findings:

Set expectations with your team.
Position automated reviews as early input—not authoritative decisions. Encourage developers to triage AI suggestions like they would lint errors: useful signals, not mandates.
Instrument your workflow.
Track metrics like comment resolution rate, PR turnaround time, and commit patterns to assess how AI reviews are influencing quality and velocity over time.
Plan for iteration, not perfection.
These tools will improve quickly. Invest time in team feedback, prompt tuning, and filtering strategies now—so you’re ready to take full advantage as the technology matures.

Used thoughtfully, automated code review can improve quality, reinforce standards, and reduce cognitive load. But the benefits depend not just on the tool’s capabilities—but on how you integrate it into the human workflow.

—

Happy Research Tuesday!

Lizzie

Chuckiiish

Keeping track of customer feedback gets tricky fast. I found Hifivestar helpful for managing reviews in one place. It made spotting trends and responding quicker much easier.

Expand full comment