RDEL #46: How does access to conversational AI impact software engineering trust and productivity?
This week we review how conversational AI tools impact performance, efficiency, and trust in software engineers as they complete a programming exam.
Welcome back to Research-Driven Engineering Leadership. Each week, we pose an interesting topic in engineering leadership, and apply the latest research in the field to drive to an answer.
Conversational AI tools have become a common tool in the software development process. This week we ask: how does access to conversational AI tooling impact software engineering trust and productivity?
The context
The rapid advancement of AI technologies has significantly enhanced the capabilities of conversational agents. Tools like Google’s Bard and OpenAI’s ChatGPT are increasingly integrated into professional workflows to aid in tasks such as coding, content creation, and research. These tools promise to streamline processes, reduce cognitive load, and increase overall efficiency. However, the effectiveness of these tools in real-world applications, especially their impact on productivity and trust, remains a subject of active research.
In the software development domain, productivity is often measured by the ability to write, debug, and optimize code efficiently. Trust in tools and technologies is equally important, as developers rely on these resources for accurate and efficient problem-solving. This study examines these aspects by evaluating the performance of software engineers using AI assistance during a coding task.
The research
Researchers at Google studied this topic by observing 76 software engineers as they completed a programming exam with and without access to Google's conversational AI, Bard. The mixed-methods research included qualitative and quantitative analysis to understand the effects on productivity, efficiency, satisfaction, and trust.
Researchers discovered a number of findings on software engineering performance, efficiency, and trust:
Performance: Access to Bard significantly improved performance for novice engineers on open-ended "solve" questions, but there was no significant effect on the performance of expert engineers.
Efficiency: While participants using Bard spent more time on tasks, they perceived themselves as more efficient compared to those without access to AI assistance. This suggests a discrepancy between actual and perceived efficiency.
Trust: Trust in Bard was initially neutral but decreased after task completion as participants adjusted their trust based on the AI's performance. Experts were particularly more likely to distrust Bard compared to novices.
Researchers also noted observed instances of automation complacency, where participants relied heavily on Bard's outputs without thorough validation. This behavior led to potential productivity pitfalls, highlighting the importance of critical assessment of AI-generated solutions.
The application
The research discovered that while access to conversational AI like Bard can significantly improve the performance of novice software engineers on certain tasks, it also leads to increased task duration and potential over-reliance on AI outputs. Trust in the AI decreased among participants, particularly experts, after using the tool, highlighting the need for critical assessment skills.
From this, engineering managers can take away a number of tips for using conversational AI on their teams:
Training and Onboarding: Use AI tools to support novice developers by providing instant feedback and code suggestions, enhancing their learning curve and reducing onboarding time.
Task Allocation: Assign solve-type or exploratory tasks to novices with AI support, while reserving more complex, high-stakes tasks for experts, who can validate AI outputs more effectively.
Efficiency Tools: Enhance productivity by integrating AI tools for routine tasks, but ensure that developers maintain a balance between AI reliance and manual validation to avoid automation complacency.
—
Happy Research Monday,
Lizzie
Thanks for such a thorough write-up of an interesting paper on a timely topic! I've been really enjoying your newsletter. There are a couple of aspects of this research that would make me hesitate to generalize too much from it. First, this is just looking at Bard. It's possible there are better or worse agents for these kinds of questions. But primarily, the drill-down from randomly selected invites to participants is massive: from 1,400 invites to participate, to 220 responses, to 76 participants leaves a lot of room for all kinds of selection effects. Not saying the conclusions here or in the paper are wrong, of course, but it's hard to judge from this evidence.