0:00:05.360,0:00:10.160
Today I'd like to tell you what we found about code review effectiveness by doing empirical

0:00:10.160,0:00:17.120
software engineering research. My hope is that it can inspire your practice. So first let me

0:00:17.120,0:00:21.760
tell you what I consider as code review. Here I'm talking about the most widespread form of

0:00:21.760,0:00:27.920
code review where basically a developer works on a software system and makes some changes that

0:00:27.920,0:00:35.040
then it sends out to one or more reviewers. These reviewers inspect the change, make some comments,

0:00:35.040,0:00:40.800
and send them back to the author and this creates the code review cycle that continues

0:00:40.800,0:00:48.640
until everybody is satisfied with the change and change is integrated in the code base. So what

0:00:48.640,0:00:55.120
we focus here today is how can we make this part - the actual review of the code - more effective

0:00:55.120,0:01:02.000
so that developers find for example more errors.
Well I don't know if I'm preaching to the choir

0:01:02.000,0:01:08.480
but code review is used so much that any improvement can have important effects- after

0:01:08.480,0:01:15.840
all we are talking about finding bugs before they reach production. Right, so first thing, I'd like

0:01:15.840,0:01:21.520
to ask you to have a look at these exemplary code review tools. They are all very similar,

0:01:22.160,0:01:28.400
right, they all look very similar to each other, but please allow me to direct your attention to

0:01:28.400,0:01:35.680
one aspect. Please look at the - at the list of files that are under review. What's their order,

0:01:35.680,0:01:42.400
right? Or well more precisely alphabetical and as developers we understand why they reached

0:01:42.400,0:01:48.880
that specific implementation. But the question is could this choice have an effect on code review?

0:01:50.000,0:01:55.840
By instrumenting a code review tool at a company we found out that the reviewer's behavior is

0:01:55.840,0:02:01.680
influenced by this. So in more than 50 percent of the reviews that we analyzed, the reviewer started

0:02:01.680,0:02:07.440
with the file presented first, and in almost 40 percent of the navigation the review - the

0:02:07.440,0:02:13.200
reviewer went to the next file in order. Well this is not a problem, right, because package

0:02:13.200,0:02:20.240
names are kind of random so they are randomly ordered. Well, except when they are not. So

0:02:20.960,0:02:29.040
test files are almost always after the production files they test. Could this be a problem? If we

0:02:29.040,0:02:36.080
couple this with the fact that developers consider test files are less important, well maybe. In fact

0:02:36.080,0:02:42.480
we looked at code review data and we found that tests are way less commented. But this could be

0:02:42.480,0:02:48.160
the results of the bias against tests. But what would happen if the tool used a different order?

0:02:49.600,0:02:55.920
So we did an experiment to test this. We set up an online code review tool and asked developers

0:02:55.920,0:03:01.280
to review some code. So we had two files, one production and one test, where we injected some

0:03:01.280,0:03:08.640
bugs. Then we decided these developers forming two groups. One group had production presented first

0:03:09.760,0:03:18.080
and one group had the test file presented first. Then we looked at their ability of finding bugs.

0:03:18.080,0:03:21.680
Concerning the bugs in production there was no difference between the groups,

0:03:22.640,0:03:34.720
but the test first group was 250 percent more likely to find the test bug. Just by

0:03:34.720,0:03:38.640
sweeping the order - switching the order - in which they were looking at these files.

0:03:40.240,0:03:45.520
This is an interesting result if you care about test files, and we should, right, because they

0:03:45.520,0:03:52.160
help us find bugs in production. But it could be that what we see is that this order goes against

0:03:52.160,0:04:00.400
the prejudice we have against test files. So it's just a way to counteract this initial bias. But

0:04:01.200,0:04:09.360
this effect may or may not exist for production code. Maybe not. So we looked into that as well.

0:04:10.160,0:04:15.920
The first thing we did was analyzing review comments for two hundred thousand requests

0:04:15.920,0:04:21.920
from very popular projects in GitHub and we found something very remarkable. Look at this:

0:04:21.920,0:04:28.640
if we took the number of comments that are put in a review by file position - let's say you

0:04:28.640,0:04:33.920
pick all the pull requests with five files and you sum up all the comments they receive across

0:04:33.920,0:04:40.400
all these pull requests with five files - this is what you see. The files in the first positions

0:04:40.400,0:04:46.480
receive significantly more comments than the files in the last positions. And of course we checked

0:04:46.480,0:04:51.840
this manually as well. Maybe you put a comment at the beginning and say fix these in all the

0:04:51.840,0:04:57.360
other files as well, but these happens very, very rarely, so these are genuine comments that don't

0:04:57.360,0:05:04.480
have much to do with one another and this is how they look like. And even more remarkable is that

0:05:04.480,0:05:10.720
this was true across all different cases. As you can see in these graphs for two, five, four, per

0:05:10.720,0:05:15.440
request with two files, with seven files, ten files, you already see this trend.

0:05:17.120,0:05:20.960
Well but these are comments, right, most are not about bugs. We know that because

0:05:20.960,0:05:26.160
we do code reviews. So maybe it's just not an important effect. To see, so what we did

0:05:27.120,0:05:33.840
was another experiment where we really wanted to see if there was such an effect and if that was

0:05:33.840,0:05:40.000
important for our effectiveness. So we use a similar line setting as before, we asked

0:05:40.000,0:05:46.240
people to review some code and we prepared some special codes, so we created five production files

0:05:46.800,0:05:52.560
and injected bugs in two of them, the first and the last. The bugs are different:

0:05:53.120,0:05:57.600
so one is a missing break, which is kind of similar to a syntax error, right,

0:05:57.600,0:06:02.080
so not a lot of understanding needed to see if it's missing, if it's really

0:06:02.080,0:06:08.880
a problem. And one instead requires a more careful reading of the documentation and you have to

0:06:08.880,0:06:14.160
compare it against the actual implementation. We call it corner case, this is boundary bug,

0:06:14.160,0:06:21.520
very common, yet it requires more understanding.
So we assigned the files ordered differently

0:06:21.520,0:06:26.800
to developers: one had this order and the other group had the order reversed.

0:06:28.880,0:06:34.400
And we compared how they fared in finding those bugs. When I talk about order it means that

0:06:34.400,0:06:40.240
it's a normal review but we order the files one after the other in different ways, so the first

0:06:40.240,0:06:47.200
group sees file A first and the second group sees file B first. So we compare the results:

0:06:47.760,0:06:53.440
for the missing break the simple syntax-like error there was no difference in the two groups - they

0:06:53.440,0:06:59.360
had the same likelihood of finding the bugs regardless of whether it was first or last in the

0:06:59.360,0:07:06.000
file list. So instead, for the corner case, the group who received it in the first presented fire

0:07:06.640,0:07:15.760
was 175 percent more likely to find the bug compared to the other group. This was the bug

0:07:15.760,0:07:24.240
that required more attention and understanding.
Okay so let me take a step back here to see this.

0:07:25.040,0:07:31.440
So we looked at a way to improve effectiveness in code review. We consider the tools that are

0:07:31.440,0:07:37.680
used in real world and pay attention to how the files were ordered. We found first, interesting

0:07:37.680,0:07:44.320
data about tests, and running an experiment, we saw that having a test later has an effect

0:07:44.320,0:07:49.680
that you can't counteract. Moving them first without major side effects and a lot of benefits.

0:07:50.800,0:07:56.560
Then we moved to production files and found a remarkable pattern in the number of comments,

0:07:57.520,0:08:03.600
and based on this, we run an experiment, and here as well the effect of file position - the way in

0:08:03.600,0:08:09.200
which you see the files in a code review - is important for your code review effectiveness.

0:08:11.440,0:08:17.120
So I think that we can take something for practice from these findings. So for example,

0:08:18.240,0:08:25.200
reviewers - sorry - reviewers should be aware of this effect and decide where to start the

0:08:25.200,0:08:30.000
review in a principled way, looking at it from a high-level perspective and understanding where to

0:08:30.000,0:08:37.040
start. Authors can guide the reviewers toward the most challenging parts of their change. After all,

0:08:37.040,0:08:44.400
they changed their part, their change, very well, so they can describe it in the description message

0:08:44.400,0:08:51.760
or add review comments to guide the reviewers. And tool builders should really try to empower users

0:08:51.760,0:09:00.320
to choose how to order their changes, and also offer other features that reflect on the fact that

0:09:01.040,0:09:08.480
the default settings that we have in our tools, in what we do, have a very strong power,

0:09:08.480,0:09:14.560
and this is after all about the power of the default settings in code review effectiveness.

0:09:15.440,0:09:21.680
So thank you for your attention. I would like to thank all my students and collaborators at

0:09:21.680,0:09:26.400
my research group who made this work possible. This was work done through the years,

0:09:26.400,0:09:31.200
we do a lot of work along these lines, we try to understand developers, their work, and finding

0:09:31.200,0:09:36.160
ways in which it can be improved, principled ways and possibly with simple solutions like

0:09:36.160,0:09:46.000
these ones. So if you're interested in hearing more please get in touch. Thank you very much.