0:00:00.300,0:00:06.120
Hi yeah thanks for that introduction, so as mentioned I'm Allison Sullivan and

0:00:06.120,0:00:11.040
I'll jump right into kind of explaining what "Proofreading the Proofreader: the Benefits of

0:00:11.040,0:00:17.280
Unit Tests for Software Models" is all about. With a starting observation that

0:00:17.280,0:00:20.040
I hope everyone can agree with, which is that it's just really

0:00:20.040,0:00:26.880
hard to write programs correctly, especially with the scale of programs in industry.

0:00:26.880,0:00:31.740
I know having taught algorithms that it's hard for people to write even just those

0:00:31.740,0:00:37.080
simple efficient algorithm design solutions, then you go into the real world and you have

0:00:37.080,0:00:41.520
millions of lines of code and it's interacting with

0:00:41.520,0:00:46.620
all of these different other systems and so it's just really hard to - especially

0:00:46.620,0:00:51.720
in one go, in the first path - paths - actually write that program correctly.

0:00:52.560,0:01:01.320
A really old solution to this is the idea of using software models to improve correctness,

0:01:01.320,0:01:06.600
which would be using the mathematical logic you might have been introduced to

0:01:06.600,0:01:12.660
a long time ago inl a discrete math class to actually specify specifically what you

0:01:12.660,0:01:15.600
expect your program to do and we typically write

0:01:15.600,0:01:22.980
these models of the design with the idea that if we can create one cohesive view of the design

0:01:22.980,0:01:28.980
it can be easier for all the developers working on the software to actually write it correctly.

0:01:29.820,0:01:35.460
And in recent years there was a recent breakthrough in the software modeling community

0:01:35.460,0:01:40.500
to start doing what we call scenario-finding toolsets.

0:01:40.500,0:01:44.820
So this is the idea where you actually execute the software model

0:01:44.820,0:01:51.960
and as output you get an entire collection of all of the different valid states your

0:01:51.960,0:01:55.800
program can enter into. And so then you can take

0:01:55.800,0:02:01.140
these valid states which I have an example of, just a simple singly linked list written here,

0:02:01.140,0:02:06.060
and the discovered scenarios are found automatically by analyzing the model,

0:02:06.060,0:02:09.900
and they're presented to the user, and ideally she would want to iterate

0:02:09.900,0:02:16.320
over them and make sure that every single one is a valid singly linked list with no cycles.

0:02:16.920,0:02:23.520
Now the problem with this is, well, it's really hard to write software

0:02:23.520,0:02:26.760
models themselves correctly. Like, yes they provide this

0:02:26.760,0:02:32.100
great oracle and those scenarios can be turned into tests over our implementation

0:02:32.100,0:02:40.320
but if the starting model is incorrect then, you know, we've just punted the problem to the model

0:02:40.320,0:02:43.920
instead of having the problem in the implementation.

0:02:43.920,0:02:51.480
And so my research looked into the idea of addressing model correctness the same way

0:02:51.480,0:02:56.760
that we do for simple blocks of code, which was the idea of establishing

0:02:56.760,0:03:03.300
a unit test for a software model. So for our singly linked list example,

0:03:03.300,0:03:09.000
an idea of a unit test that we came up with was to basically allow the

0:03:09.000,0:03:16.080
user to specify one of those expected scenarios that should be found when the model is executed

0:03:16.080,0:03:20.460
and to say, you know, I expect a list with one node

0:03:20.460,0:03:28.020
and no links anywhere to be found as a valid list, and also support the idea of saying, you know,

0:03:28.020,0:03:32.580
if I then add a cycle to that list - in this example here it would be

0:03:32.580,0:03:36.600
just a self loop on the node - I expect this to be prevented.

0:03:36.600,0:03:43.200
So it's a pretty simple idea of a unit test but it brings in a way to, kind of,

0:03:43.200,0:03:47.880
quickly spot check your model because when you're executing

0:03:47.880,0:03:51.360
and generating these scenarios, it's true that you're getting the

0:03:51.360,0:03:55.200
snapshots of valid states your program can enter and that's really nice,

0:03:55.200,0:04:00.900
but in practice since this is a bounded but exhaustive search

0:04:00.900,0:04:07.380
it's very common to find hundreds of scenarios. And so when you're writing your model

0:04:07.380,0:04:11.820
it might be slightly incorrect, and that incorrectness could mean,

0:04:11.820,0:04:19.620
out of the 150 scenarios, two are incorrect. Or maybe you just slightly wrote something

0:04:19.620,0:04:24.960
wrong and you actually produce 149 scenarios instead of 150

0:04:24.960,0:04:30.300
and you have to realize as you're enumerating, oh, I don't think I ever saw this one type

0:04:30.300,0:04:35.280
of list and I expected to see it. So the unit tests give the developer

0:04:35.280,0:04:40.500
a way to, kind of, have a lightweight spot check of their software model

0:04:40.500,0:04:44.280
and it's a practice that makes it feel a lot more familiar,

0:04:44.280,0:04:50.100
a bit in the way - closer to how you would develop code and do a quick unit test,

0:04:50.100,0:04:53.520
or, you know, especially on a complicated little method,

0:04:53.520,0:04:59.220
making sure you covered your corner cases. But the other benefit that we ended

0:04:59.220,0:05:03.600
up finding with the unit test is that once we had the definition of a unit test,

0:05:03.600,0:05:08.700
there were a whole bunch of, like, automated software testing practices

0:05:08.700,0:05:16.380
we could now look to support for a software model. And what we discovered was that the execution

0:05:16.380,0:05:20.700
environment and the fact that we're dealing with these formal logics

0:05:20.700,0:05:26.040
actually made a lot of those testing practices easier to automate

0:05:26.040,0:05:33.240
and you got even more benefit because some of the hard problems for a language like Java or C++

0:05:33.240,0:05:38.400
are actually pretty simple to solve on a model. So to illustrate,

0:05:38.940,0:05:42.000
I'm going to step over an example for mutation testing.

0:05:42.000,0:05:46.740
So mutation testing is one of the research areas in software testing

0:05:46.740,0:05:50.160
and the idea is, if I have an original program,

0:05:50.160,0:05:57.300
I can actually intentionally mutate the program and introduce changes to it

0:05:57.300,0:06:05.340
and by doing that I want to make sure that I have a test that is capable of detecting this change.

0:06:05.340,0:06:09.900
So in this original there's two different mutants on the screen here.

0:06:09.900,0:06:15.060
So for the first mutant, if I have a test where x is 0 and Y is 1,

0:06:15.060,0:06:20.580
I'll observe a difference in output - the original will produce zero as a result and

0:06:20.580,0:06:24.720
the mutant will produce one. However for mutant two,

0:06:24.720,0:06:28.560
no input will distinguish it from the original program:

0:06:28.560,0:06:33.480
it happened to be a logically equivalent change to the program.

0:06:33.480,0:06:40.500
We call that a equivalent mutant and it's a really hard problem to solve in these imperative

0:06:40.500,0:06:43.260
programming languages, you know,

0:06:43.260,0:06:48.480
right now you have to do just kind of a manual analysis of something we call propagation

0:06:48.480,0:06:53.040
to see if it's ever possible to actually influence the output,

0:06:53.040,0:06:57.660
you do a whole bunch of analysis through the control flow to reach the conclusion,

0:06:57.660,0:07:00.180
oh, this is an equivalent mutant.

0:07:00.180,0:07:05.820
So the existence of those equivalent mutants really impacts the applicability of doing

0:07:05.820,0:07:12.180
an automated mutation testing environment even though it produces really strong tests,

0:07:12.180,0:07:14.640
it just has that downside.

0:07:15.480,0:07:17.760
However, in a software model,

0:07:17.760,0:07:25.380
if we create a mutation, the modeling language itself allows us to ask,

0:07:25.380,0:07:31.320
is there ever a situation in which these two formulas differ from each other?

0:07:31.320,0:07:36.300
So we can actually just ask the language to check for equivalence

0:07:36.300,0:07:43.920
and if they're not equivalent - if they do differ - then we do have the support in these languages

0:07:43.920,0:07:49.800
to actually generate that scenario that shows their difference in behavior,

0:07:49.800,0:07:54.840
so we can automatically create this test that kills the mutant.

0:07:54.840,0:07:59.880
And so we are actually able to solve these really hard problems - to do

0:07:59.880,0:08:05.340
mutation testing in other languages - really, really simply in the modeling language,

0:08:05.340,0:08:11.040
just leveraging the nature of its execution. We've gone on to do things like fault localization

0:08:11.040,0:08:17.490
- trying to automatically figure out what part of the model is incorrect given some failing test -

0:08:17.490,0:08:24.180
and automated repair - to then go on and correct that faulty location so that all tests pass -

0:08:24.180,0:08:29.160
and we've encountered similar benefits where some of these hard problems are just easier.

0:08:29.160,0:08:31.260
For instance, for repair,

0:08:31.260,0:08:35.880
if you think about a Java statement, there's so many different valid forms that

0:08:35.880,0:08:41.520
Java statement can take - the grammar is so open - but in a modeling language the grammar

0:08:41.520,0:08:45.180
is much more constrained, so our search space is smaller

0:08:45.180,0:08:49.680
and it's easier to then do something in the realm of an exhaustive search

0:08:49.680,0:08:53.880
that will really be able to explore a lot of repairs

0:08:53.880,0:09:01.560
without having the problem of running for 24 plus hours just to try and create an automated patch.

0:09:02.160,0:09:07.860
So that's kind of the key idea of my research direction in general,

0:09:07.860,0:09:13.440
is this idea that software modeling can have a lot of benefits,

0:09:13.440,0:09:19.560
but it has a lingering reputation that it's hard to scale and it's hard to use.

0:09:19.560,0:09:25.380
Research has heavily focused on addressing the scalability issue with real dividends -

0:09:25.380,0:09:32.100
there's a lot of work coming out, getting cited, including Amazon citing some work

0:09:32.100,0:09:36.300
with TLA+ and SMT solvers about actually deploying

0:09:36.300,0:09:41.580
these formal methods in the real world - but the ease of use problem still remains.

0:09:41.580,0:09:45.780
And so my research group's focus has been the idea that unit testing,

0:09:45.780,0:09:50.880
and everything that that enables, can help ease the adoption of software models.

0:09:50.880,0:09:56.100
And it turns out that these techniques can be done in a really nice automated way

0:09:56.100,0:10:02.700
because of the environment that we're working in. And also, just a takeaway that I had is,

0:10:02.700,0:10:08.640
if you happen to be working with a non-traditional language that might not have built-in unit tests,

0:10:08.640,0:10:14.040
I would consider investing in trying to come up with a structured unit testing environment.

0:10:14.040,0:10:21.240
You might be surprised by all the further benefits and unique automation opportunities that exist

0:10:21.240,0:10:30.600
given your unique execution environment. Fantastic, I particularly love this,

0:10:30.600,0:10:34.260
it's very near and dear to my heart, this idea of unit testing and the value

0:10:34.260,0:10:38.100
and how we can think about expanding that. I think it's especially interesting,

0:10:38.880,0:10:41.640
and I'm gonna give - I'm gonna ask my question to give a chance for another

0:10:41.640,0:10:47.580
questions to come into the the Slack over here - but when we're talking about unit testing in the

0:10:47.580,0:10:48.840
context of software models, right,

0:10:48.840,0:10:51.360
we're talking about an abstraction of a piece of software,

0:10:51.360,0:10:55.140
which I think makes sense to be thinking about validation at that point.

0:10:55.140,0:10:59.040
Have you thought about, or have you done any work that gives any insights into,

0:10:59.040,0:11:03.060
the value beyond that, right, like if I'm doing unit testing

0:11:03.060,0:11:05.760
at my software model level, does that buy me something

0:11:05.760,0:11:09.420
when it comes time for implementation, when it comes time for maintenance,

0:11:09.420,0:11:13.260
when it comes time for - for some of the tasks that come later down the pipeline,

0:11:13.260,0:11:16.980
any insights you can share or thoughts on that? Yeah,

0:11:16.980,0:11:20.340
so there is definitely a whole body of work dedicated towards

0:11:20.340,0:11:26.340
taking those scenarios that get produced and trying to figure out how to automate those

0:11:26.340,0:11:29.880
and to test over the implementation, but as mentioned,

0:11:29.880,0:11:36.600
the fact that it's a bounded exhaustive search - oftentimes that runs quickly into scalability

0:11:36.600,0:11:40.320
issues and maybe even a lot of redundant paths through your tests

0:11:40.320,0:11:43.800
and some of these scenarios are very similar but slightly different,

0:11:43.800,0:11:50.640
so there's a whole body of work dedicated to creating a more interesting subset of scenarios.

0:11:50.640,0:11:56.520
And so we've looked into trying to map in the idea of coverage metrics to a model

0:11:56.520,0:12:01.140
and then you generate tests for that coverage and then those scenarios,

0:12:01.140,0:12:05.160
you would then translate into tests over the implementation

0:12:05.160,0:12:11.100
and it would be guided by model coverage and then the goal is hoping that that leads to

0:12:11.100,0:12:16.320
interesting coverage within the implementation. We haven't actually studied how well that

0:12:16.320,0:12:21.720
last part connects in, but I think that that is definitely the next step,

0:12:21.720,0:12:25.020
is figuring out how to connect it better to the implementation.

0:12:26.520,0:12:28.800
Fantastic, and so we do have

0:12:28.800,0:12:31.860
one more question I want to make sure we get squeezed in before we get to the next talk,

0:12:32.700,0:12:35.220
in the chat, we have a question from Greg that says,

0:12:35.220,0:12:39.060
what's the best intro to this kind of modeling for a working programmer

0:12:39.060,0:12:45.120
who never did or doesn't remember - most of us - a discrete math or formal methods course?

0:12:45.840,0:12:49.680
Yeah, so the language I use in

0:12:49.680,0:12:58.140
the slides is called ALLOY a-l-l-o-y and it has been kind of a focus in

0:12:58.140,0:13:02.340
undergraduate courses that are looking to start introducing formal methods

0:13:02.340,0:13:07.320
because it does, as output, automatically produce graphical scenarios.

0:13:07.320,0:13:12.060
Some of the other scenario finders will produce text outputs of the scenarios

0:13:12.060,0:13:17.400
and so that's not as approachable. And there's recent work to create - that

0:13:17.400,0:13:24.300
has created this website called Alloy for Fun, and so it is a website you can go to,

0:13:24.300,0:13:30.780
it has Alloy installed and running in its backend and you just type in - it gives you prompts

0:13:30.780,0:13:32.460
to fill in - and then it

0:13:32.460,0:13:38.340
has a back-end oracle it's checking you against and so it'll show you examples of scenarios that

0:13:38.340,0:13:43.320
you allowed that the oracle didn't or vice versa to help correct you,

0:13:43.320,0:13:49.080
so Alloy is something that has been used a bit more more in the educational perspective

0:13:49.080,0:13:51.600
so I would say resources like that would be beneficial.