RDEL #145: How do coding agents compare to copilots for developer productivity?
Agents unlocked tasks copilots couldn't solve and lowered cognitive load, yet 55% of developers felt they understood agent outputs less well than copilot outputs.
Welcome back to Research-Driven Engineering Leadership. Each week, we pose an interesting topic in engineering leadership and apply the latest research in the field to drive to an answer.
Many engineering leaders are now weighing whether to roll out coding agents alongside the copilots their teams already use. The pitch is appealing: copilots code with developers, while agents code for them. This week we ask: how does increasing AI automation in coding tools actually change developer experience?
The context
Engineering teams now have two distinct paradigms of AI coding tools to choose between. Copilots like GitHub Copilot assist developers incrementally, offering autocomplete suggestions and chat-based help while the developer stays in command. Coding agents take a natural language task description and autonomously plan multi-step work, edit files, run code, and iterate. The shift in role is significant, akin to moving from pair programming to working alongside a fellow engineer.
Despite the rapid adoption of agentic tools, most evaluations rely on static benchmarks without real developers in the loop, and most user studies focus exclusively on copilots. That leaves engineering leaders largely guessing at how the two paradigms compare in practice, and whether the productivity gains from agents come at a cost to developer experience or code understanding.
The research
Researchers at Carnegie Mellon University and All Hands AI conducted a controlled user study comparing coding copilots and agents, recruiting 20 experienced GitHub Copilot users who were new to agents. Each participant solved realistic tasks using both GitHub Copilot and OpenHands, with order and task type randomized.
Here’s what they learned:
Agents significantly improved task completion. Participants completed 60% of tasks correctly with OpenHands compared to only 25% with GitHub Copilot, a 35-percentage-point lift (p=0.02). Some tasks, like the data analysis problems requiring web scraping, were only ever completed in the agent condition.
User effort dropped roughly in half. Among developers who completed a task correctly, active user time averaged 12.5 minutes with the agent versus 25.1 minutes with the copilot (p=0.01). Notably, total wall-clock time including the agent’s autonomous work was comparable at ~28 minutes.
Cognitive load was significantly lower with agents. 75% of participants agreed they experienced less cognitive load using agents (p=0.0006). One participant described the experience as “like autopilot,” with another noting “it feels like I can tell it what to do and let it run on its own.”
Agents unlocked new tasks that copilots couldn’t. 70% strongly agreed the agent enabled them to accomplish work they couldn’t have with the copilot (p=0.0013), particularly in unfamiliar codebases. One participant noted: “without prior knowledge on the code structure of matplotlib, I was able to understand and do the task at the same time.”
Developers understood agent outputs less well. 55% of participants disagreed that they had better understanding of agent outputs versus copilot outputs. Agent changes “seemed kind of hidden” and felt “hard to control,” and 60% said they would continue to prefer GitHub Copilot for everyday work.
The application
Coding agents deliver real, measurable productivity gains: more tasks completed, less developer time spent, and lower cognitive load. But those gains come paired with a new problem engineering leaders need to plan for, which is that developers lose visibility into what the AI actually did. Adoption strategy matters more than the tool choice itself.
Pair agents with explicit review checkpoints, not just trust. Agents will autonomously edit multiple files, install packages, and even push to GitHub without explicit permission. Set team norms that require developers to inspect file diffs and explain agent changes in PR descriptions before merging. The “50% less effort” finding only holds if that recovered time goes to understanding, not rubber-stamping.
Match the tool to the task type, not the developer’s preference. Agents excelled at unfamiliar codebases, environment setup, and tasks requiring web scraping or complex multi-file edits. Copilots remained competitive for focused, well-scoped changes in code the developer already knows. Coach engineers to default to agents for exploration and setup-heavy work, and stay with copilots for tightly-scoped edits in their own modules.
Pilot multi-tasking workflows before banking on speed gains. Agent-driven work creates idle time while the AI runs, and participants noticed it (”the generation time is slower,” “I’m just kind of sitting there”). Have your team experiment with running parallel agents, reviewing prior code, or doing focused thinking during agent runs before assuming halved user effort will translate into halved total cycle time.
Whatever paradigm your team is exploring, I hope your week is full of experimentation. Happy Research Tuesday!
Lizzie



The 60% to 25% completion gap is striking, but the 60% still-prefer-Copilot number is the one I keep returning to. Lower cognitive load (75%, p=0.0006) paired with weaker comprehension (55% disagreed they understood agent output) tells a Faustian story: developers traded understanding for relief and now know it. From writing theaifounder.substack.com I see teams adopting OpenHands-class agents without a single review-process change, which guarantees the comprehension debt compounds. What signal would tell you a team has crossed from 'agent-curious' to 'agent-dependent in a way that hurts on-call'?