RDEL #9: Is there a gender disparity in the code review process?
A closer look at whether gender plays a role in the code review process.
Happy Monday! Each week, we pose an interesting topic in engineering leadership, and apply the latest research in the field to drive to an answer.
This week, we look at how code reviews are distributed across the team. How are code reviews distributed by gender, and is there a disparity in who reviews code?
The context:
Code reviews are central to the collaboration process on a software team. On most teams, code authors will ask a group of teammates to review their code, getting their feedback and approval (the classic “LGTM”) before merging the code into the main branch. The code review process is not just important for approving features - they are important tools for sharing knowledge across the team and mitigating the turnover-induced knowledge loss.
Given the existing gender demographics on software engineering teams (~20% women in IC level roles), significant research has gone into inequities on software teams. The results have shown that inequities prevail in a number of software development processes. For example, in a study of open source reviews on Github, researchers found that when women are perceived as women their acceptance rate is lower than men. However, when gender was not perceptible in a review, women’s acceptance rates were higher than that of mens.
This week’s study looks at the corporate environment, using a corpus of code reviews at Google to determine whether gender disparities exist in the code review process.
The research:
Emerson Murphy-Hill and a team of researchers at Google and CMU studied gender inequities in the code review process using code reviews completed at Google in early 2019. Using a series of regression analysis on the data, they were able to normalize the data against role, tenure, level, and job code.
The results found that:
Women performed 17% fewer reviews than men
Women review 16.3% fewer code reviews when the reviewers are manually assigned
Women and men are equally likely to complete an assigned review (only 0.5% difference)
Women submit 8.7% fewer (but 4.8% larger) changelists
“Review suggestion tools”, which automate review selection, can sometimes have algorithmic bias because they preference reviewers who have previously been on the same changelist
The application:
Inequities in the code review process compound over time, so its crucial for teams to identify and close the gap early to distribute knowledge more evenly across the team.
Some ways that managers can quickly identify and improve code review selection include:
Consider a round-robin review process. This algorithm preferences people with the least recent review request, which can highlight someone who isn’t as frequently selected in the manual process. (Note: Github also has a load balancing review algorithm, which tries to fairly distribute to all teammates based on their 30-day review load)
Keep track of the pull request distribution on a team. More often than not, these inequities are due to an unconscious bias. The best way to mitigate this is to start measuring the distribution of pull requests, and using both systems and perceptions to make them more conscious and fair.
Systems: Measure how PRs get distributed on the team. Consider the minimum, median, and maximum # of PRs reviewed on a team per week. On a team with a healthy balance, the median and maximum are not too far from one another. (Note: it’s less important to know the individuals behind each number, and more important to know how the data points change each week).
Perceptions: Ask teammates to share how they’ve each felt about their review load or how they select reviewers. The team will recognize opportunities to share reviews more evenly through open conversation.
We hope these tools will bring about interesting discussions and a more distributed code reviews in your team. Have a great rest of the week! 🎉
Lizzie
From the Quotient team