RDEL #146: Which popular beliefs about GenAI and software engineering hold up to research?
Researchers unpack eight common GenAI myths and what the evidence actually says.
Welcome back to Research-Driven Engineering Leadership. Each week, we pose an interesting topic in engineering leadership and apply the latest research in the field to drive to an answer.
Almost every engineering leader has been asked some version of “what’s your AI productivity?” in the last twelve months. The expected answer is a single, impressive figure, but the actual evidence on GenAI in software engineering is much messier than the headlines suggest. This week we ask: which popular beliefs about AI and developer productivity does the research actually support?
The context
The industry conversation about GenAI in software has moved faster than the research that should ground it. Procurement decisions, headcount plans, and quarterly OKRs are being shaped by vendor demos, anecdotes, and a handful of controlled studies on toy problems. Engineering leaders are then asked to translate all of that into something true about their own teams.
The trouble is that many of the most popular claims rest on assumptions that don’t hold up. If developers don’t actually spend most of their day coding, how much can a coding assistant really change? If your dashboard tracks “AI-generated lines of code,” what is it really measuring? Researchers studying real teams have started to push back on the loudest narratives.
The research
In a recent ACM Queue article, prominent productivity researchers (including co-authors of the SPACE framework) examine eight of the most persistent myths about GenAI in software engineering. They draw on recent large-scale studies, including a 2025 Microsoft study of more than 450 engineers, developer interviews, and a synthesis of recent field research.
Myth 1: Developers spend most of their time writing code.
A 2025 Microsoft study found developers spend only about 14% of their time writing code, with earlier research showing 18% on a “good” day and 11% on a “bad” one. Most of the day goes to meetings, design, code review, and collaboration.
Myth 2: Writing code is the bottleneck.
If coding is roughly 15% of the day, even an AI assist that doubles coding speed would deliver less than a 15% overall productivity gain. Accelerating code creation often just moves pressure downstream to review, testing, and integration, leaving the “outer loop” of development largely unchanged.
Myth 3: Lines of code is the best measure of AI impact.
A 2014 study found LoC “fails to meet the specified validity tests and, therefore, has limited utility,” yet organizations (including Microsoft publicly) now track AI-generated LoC. Volume metrics encourage gaming, erode trust, and reward more code over better software.
Myth 4: AI helps all tasks and engineers equally.
Microsoft’s 2024 productivity research found familiar tasks see larger Copilot gains than unfamiliar ones, while a 2025 study of experienced open source developers found AI tools actually increased implementation time by 18%. Even rewriting a prompt to be semantically equivalent changed the generated code in 46% of cases and changed correctness in 28%.
Myth 5: AI will turn individuals into 10x developers.
The widely cited 55% productivity gain comes from controlled studies on isolated tasks and rarely survives contact with team-based real-world work. Much of the variation in developer performance is attributable to the task itself, not to the individual using the tool.
Myth 6: It’s up to each developer to make AI work. The paper invokes Cal Newport: historic productivity jumps came from systematic redesigns like the assembly line, not from handing individuals better tools. GenAI is being deployed the opposite way, with millions in licenses and minimal organizational guidance, leaving each engineer to optimize their own workflow on the side.
Myth 7: High-performing AI tools will be adopted automatically.
While 80% of developers use AI tools, only 29% trust the accuracy of the output, and many report spending more time debugging AI output than writing code themselves. Research also documents a “competence penalty,” where developers (especially women and older engineers) receive harsher evaluations for AI-assisted work even when the output is identical.
Myth 8: Enterprises can innovate at startup speed with GenAI.
Startups build on open-source frameworks well represented in LLM training data, while enterprise stacks rely on proprietary tools, legacy code, and backward-compatibility requirements the models have never seen. Layered on top are compliance, security, and customer expectations that startups simply don’t face.
The application
The thread tying these myths together is that GenAI’s value comes from how it’s deployed, measured, and supported, not from access to the tool. Leaders who treat AI rollout as a procurement decision will get procurement-decision results.
Here’s how leaders can apply these findings:
Audit the 86% before optimizing the 14%. Before adding another usage metric or buying more seats, map where your team’s time actually goes for a week. The biggest delivery improvements almost always live in code review queues, design discussions, environment setup, and meetings, not in keystrokes saved inside the editor.
Point AI at one outer-loop bottleneck and measure the delivery metric, not the tool. Pick a slow phase such as code review turnaround, onboarding ramp time, or flaky-test triage, set a baseline, deploy AI deliberately against it, and track the outcome you actually care about (cycle time, change failure rate, time to first PR).
Close the trust gap before chasing the productivity gain. Run a short survey on where AI output has cost the team time, where engineers re-verify by hand, and where they avoid the tool entirely. Use the answers to set explicit team conventions: which task types are AI-eligible, what review bar applies to AI-assisted code, and how managers will weigh AI-assisted work fairly in performance reviews to avoid the competence penalty the paper documents.
Hope your week brings useful data and thoughtful AI analysis. Happy Research Tuesday!
Lizzie


