0:00:00.300,0:00:06.120 Hi yeah thanks for that introduction, so as mentioned I'm Allison Sullivan and 0:00:06.120,0:00:11.040 I'll jump right into kind of explaining what "Proofreading the Proofreader: the Benefits of 0:00:11.040,0:00:17.280 Unit Tests for Software Models" is all about. With a starting observation that 0:00:17.280,0:00:20.040 I hope everyone can agree with, which is that it's just really 0:00:20.040,0:00:26.880 hard to write programs correctly, especially with the scale of programs in industry. 0:00:26.880,0:00:31.740 I know having taught algorithms that it's hard for people to write even just those 0:00:31.740,0:00:37.080 simple efficient algorithm design solutions, then you go into the real world and you have 0:00:37.080,0:00:41.520 millions of lines of code and it's interacting with 0:00:41.520,0:00:46.620 all of these different other systems and so it's just really hard to - especially 0:00:46.620,0:00:51.720 in one go, in the first path - paths - actually write that program correctly. 0:00:52.560,0:01:01.320 A really old solution to this is the idea of using software models to improve correctness, 0:01:01.320,0:01:06.600 which would be using the mathematical logic you might have been introduced to 0:01:06.600,0:01:12.660 a long time ago inl a discrete math class to actually specify specifically what you 0:01:12.660,0:01:15.600 expect your program to do and we typically write 0:01:15.600,0:01:22.980 these models of the design with the idea that if we can create one cohesive view of the design 0:01:22.980,0:01:28.980 it can be easier for all the developers working on the software to actually write it correctly. 0:01:29.820,0:01:35.460 And in recent years there was a recent breakthrough in the software modeling community 0:01:35.460,0:01:40.500 to start doing what we call scenario-finding toolsets. 0:01:40.500,0:01:44.820 So this is the idea where you actually execute the software model 0:01:44.820,0:01:51.960 and as output you get an entire collection of all of the different valid states your 0:01:51.960,0:01:55.800 program can enter into. And so then you can take 0:01:55.800,0:02:01.140 these valid states which I have an example of, just a simple singly linked list written here, 0:02:01.140,0:02:06.060 and the discovered scenarios are found automatically by analyzing the model, 0:02:06.060,0:02:09.900 and they're presented to the user, and ideally she would want to iterate 0:02:09.900,0:02:16.320 over them and make sure that every single one is a valid singly linked list with no cycles. 0:02:16.920,0:02:23.520 Now the problem with this is, well, it's really hard to write software 0:02:23.520,0:02:26.760 models themselves correctly. Like, yes they provide this 0:02:26.760,0:02:32.100 great oracle and those scenarios can be turned into tests over our implementation 0:02:32.100,0:02:40.320 but if the starting model is incorrect then, you know, we've just punted the problem to the model 0:02:40.320,0:02:43.920 instead of having the problem in the implementation. 0:02:43.920,0:02:51.480 And so my research looked into the idea of addressing model correctness the same way 0:02:51.480,0:02:56.760 that we do for simple blocks of code, which was the idea of establishing 0:02:56.760,0:03:03.300 a unit test for a software model. So for our singly linked list example, 0:03:03.300,0:03:09.000 an idea of a unit test that we came up with was to basically allow the 0:03:09.000,0:03:16.080 user to specify one of those expected scenarios that should be found when the model is executed 0:03:16.080,0:03:20.460 and to say, you know, I expect a list with one node 0:03:20.460,0:03:28.020 and no links anywhere to be found as a valid list, and also support the idea of saying, you know, 0:03:28.020,0:03:32.580 if I then add a cycle to that list - in this example here it would be 0:03:32.580,0:03:36.600 just a self loop on the node - I expect this to be prevented. 0:03:36.600,0:03:43.200 So it's a pretty simple idea of a unit test but it brings in a way to, kind of, 0:03:43.200,0:03:47.880 quickly spot check your model because when you're executing 0:03:47.880,0:03:51.360 and generating these scenarios, it's true that you're getting the 0:03:51.360,0:03:55.200 snapshots of valid states your program can enter and that's really nice, 0:03:55.200,0:04:00.900 but in practice since this is a bounded but exhaustive search 0:04:00.900,0:04:07.380 it's very common to find hundreds of scenarios. And so when you're writing your model 0:04:07.380,0:04:11.820 it might be slightly incorrect, and that incorrectness could mean, 0:04:11.820,0:04:19.620 out of the 150 scenarios, two are incorrect. Or maybe you just slightly wrote something 0:04:19.620,0:04:24.960 wrong and you actually produce 149 scenarios instead of 150 0:04:24.960,0:04:30.300 and you have to realize as you're enumerating, oh, I don't think I ever saw this one type 0:04:30.300,0:04:35.280 of list and I expected to see it. So the unit tests give the developer 0:04:35.280,0:04:40.500 a way to, kind of, have a lightweight spot check of their software model 0:04:40.500,0:04:44.280 and it's a practice that makes it feel a lot more familiar, 0:04:44.280,0:04:50.100 a bit in the way - closer to how you would develop code and do a quick unit test, 0:04:50.100,0:04:53.520 or, you know, especially on a complicated little method, 0:04:53.520,0:04:59.220 making sure you covered your corner cases. But the other benefit that we ended 0:04:59.220,0:05:03.600 up finding with the unit test is that once we had the definition of a unit test, 0:05:03.600,0:05:08.700 there were a whole bunch of, like, automated software testing practices 0:05:08.700,0:05:16.380 we could now look to support for a software model. And what we discovered was that the execution 0:05:16.380,0:05:20.700 environment and the fact that we're dealing with these formal logics 0:05:20.700,0:05:26.040 actually made a lot of those testing practices easier to automate 0:05:26.040,0:05:33.240 and you got even more benefit because some of the hard problems for a language like Java or C++ 0:05:33.240,0:05:38.400 are actually pretty simple to solve on a model. So to illustrate, 0:05:38.940,0:05:42.000 I'm going to step over an example for mutation testing. 0:05:42.000,0:05:46.740 So mutation testing is one of the research areas in software testing 0:05:46.740,0:05:50.160 and the idea is, if I have an original program, 0:05:50.160,0:05:57.300 I can actually intentionally mutate the program and introduce changes to it 0:05:57.300,0:06:05.340 and by doing that I want to make sure that I have a test that is capable of detecting this change. 0:06:05.340,0:06:09.900 So in this original there's two different mutants on the screen here. 0:06:09.900,0:06:15.060 So for the first mutant, if I have a test where x is 0 and Y is 1, 0:06:15.060,0:06:20.580 I'll observe a difference in output - the original will produce zero as a result and 0:06:20.580,0:06:24.720 the mutant will produce one. However for mutant two, 0:06:24.720,0:06:28.560 no input will distinguish it from the original program: 0:06:28.560,0:06:33.480 it happened to be a logically equivalent change to the program. 0:06:33.480,0:06:40.500 We call that a equivalent mutant and it's a really hard problem to solve in these imperative 0:06:40.500,0:06:43.260 programming languages, you know, 0:06:43.260,0:06:48.480 right now you have to do just kind of a manual analysis of something we call propagation 0:06:48.480,0:06:53.040 to see if it's ever possible to actually influence the output, 0:06:53.040,0:06:57.660 you do a whole bunch of analysis through the control flow to reach the conclusion, 0:06:57.660,0:07:00.180 oh, this is an equivalent mutant. 0:07:00.180,0:07:05.820 So the existence of those equivalent mutants really impacts the applicability of doing 0:07:05.820,0:07:12.180 an automated mutation testing environment even though it produces really strong tests, 0:07:12.180,0:07:14.640 it just has that downside. 0:07:15.480,0:07:17.760 However, in a software model, 0:07:17.760,0:07:25.380 if we create a mutation, the modeling language itself allows us to ask, 0:07:25.380,0:07:31.320 is there ever a situation in which these two formulas differ from each other? 0:07:31.320,0:07:36.300 So we can actually just ask the language to check for equivalence 0:07:36.300,0:07:43.920 and if they're not equivalent - if they do differ - then we do have the support in these languages 0:07:43.920,0:07:49.800 to actually generate that scenario that shows their difference in behavior, 0:07:49.800,0:07:54.840 so we can automatically create this test that kills the mutant. 0:07:54.840,0:07:59.880 And so we are actually able to solve these really hard problems - to do 0:07:59.880,0:08:05.340 mutation testing in other languages - really, really simply in the modeling language, 0:08:05.340,0:08:11.040 just leveraging the nature of its execution. We've gone on to do things like fault localization 0:08:11.040,0:08:17.490 - trying to automatically figure out what part of the model is incorrect given some failing test - 0:08:17.490,0:08:24.180 and automated repair - to then go on and correct that faulty location so that all tests pass - 0:08:24.180,0:08:29.160 and we've encountered similar benefits where some of these hard problems are just easier. 0:08:29.160,0:08:31.260 For instance, for repair, 0:08:31.260,0:08:35.880 if you think about a Java statement, there's so many different valid forms that 0:08:35.880,0:08:41.520 Java statement can take - the grammar is so open - but in a modeling language the grammar 0:08:41.520,0:08:45.180 is much more constrained, so our search space is smaller 0:08:45.180,0:08:49.680 and it's easier to then do something in the realm of an exhaustive search 0:08:49.680,0:08:53.880 that will really be able to explore a lot of repairs 0:08:53.880,0:09:01.560 without having the problem of running for 24 plus hours just to try and create an automated patch. 0:09:02.160,0:09:07.860 So that's kind of the key idea of my research direction in general, 0:09:07.860,0:09:13.440 is this idea that software modeling can have a lot of benefits, 0:09:13.440,0:09:19.560 but it has a lingering reputation that it's hard to scale and it's hard to use. 0:09:19.560,0:09:25.380 Research has heavily focused on addressing the scalability issue with real dividends - 0:09:25.380,0:09:32.100 there's a lot of work coming out, getting cited, including Amazon citing some work 0:09:32.100,0:09:36.300 with TLA+ and SMT solvers about actually deploying 0:09:36.300,0:09:41.580 these formal methods in the real world - but the ease of use problem still remains. 0:09:41.580,0:09:45.780 And so my research group's focus has been the idea that unit testing, 0:09:45.780,0:09:50.880 and everything that that enables, can help ease the adoption of software models. 0:09:50.880,0:09:56.100 And it turns out that these techniques can be done in a really nice automated way 0:09:56.100,0:10:02.700 because of the environment that we're working in. And also, just a takeaway that I had is, 0:10:02.700,0:10:08.640 if you happen to be working with a non-traditional language that might not have built-in unit tests, 0:10:08.640,0:10:14.040 I would consider investing in trying to come up with a structured unit testing environment. 0:10:14.040,0:10:21.240 You might be surprised by all the further benefits and unique automation opportunities that exist 0:10:21.240,0:10:30.600 given your unique execution environment. Fantastic, I particularly love this, 0:10:30.600,0:10:34.260 it's very near and dear to my heart, this idea of unit testing and the value 0:10:34.260,0:10:38.100 and how we can think about expanding that. I think it's especially interesting, 0:10:38.880,0:10:41.640 and I'm gonna give - I'm gonna ask my question to give a chance for another 0:10:41.640,0:10:47.580 questions to come into the the Slack over here - but when we're talking about unit testing in the 0:10:47.580,0:10:48.840 context of software models, right, 0:10:48.840,0:10:51.360 we're talking about an abstraction of a piece of software, 0:10:51.360,0:10:55.140 which I think makes sense to be thinking about validation at that point. 0:10:55.140,0:10:59.040 Have you thought about, or have you done any work that gives any insights into, 0:10:59.040,0:11:03.060 the value beyond that, right, like if I'm doing unit testing 0:11:03.060,0:11:05.760 at my software model level, does that buy me something 0:11:05.760,0:11:09.420 when it comes time for implementation, when it comes time for maintenance, 0:11:09.420,0:11:13.260 when it comes time for - for some of the tasks that come later down the pipeline, 0:11:13.260,0:11:16.980 any insights you can share or thoughts on that? Yeah, 0:11:16.980,0:11:20.340 so there is definitely a whole body of work dedicated towards 0:11:20.340,0:11:26.340 taking those scenarios that get produced and trying to figure out how to automate those 0:11:26.340,0:11:29.880 and to test over the implementation, but as mentioned, 0:11:29.880,0:11:36.600 the fact that it's a bounded exhaustive search - oftentimes that runs quickly into scalability 0:11:36.600,0:11:40.320 issues and maybe even a lot of redundant paths through your tests 0:11:40.320,0:11:43.800 and some of these scenarios are very similar but slightly different, 0:11:43.800,0:11:50.640 so there's a whole body of work dedicated to creating a more interesting subset of scenarios. 0:11:50.640,0:11:56.520 And so we've looked into trying to map in the idea of coverage metrics to a model 0:11:56.520,0:12:01.140 and then you generate tests for that coverage and then those scenarios, 0:12:01.140,0:12:05.160 you would then translate into tests over the implementation 0:12:05.160,0:12:11.100 and it would be guided by model coverage and then the goal is hoping that that leads to 0:12:11.100,0:12:16.320 interesting coverage within the implementation. We haven't actually studied how well that 0:12:16.320,0:12:21.720 last part connects in, but I think that that is definitely the next step, 0:12:21.720,0:12:25.020 is figuring out how to connect it better to the implementation. 0:12:26.520,0:12:28.800 Fantastic, and so we do have 0:12:28.800,0:12:31.860 one more question I want to make sure we get squeezed in before we get to the next talk, 0:12:32.700,0:12:35.220 in the chat, we have a question from Greg that says, 0:12:35.220,0:12:39.060 what's the best intro to this kind of modeling for a working programmer 0:12:39.060,0:12:45.120 who never did or doesn't remember - most of us - a discrete math or formal methods course? 0:12:45.840,0:12:49.680 Yeah, so the language I use in 0:12:49.680,0:12:58.140 the slides is called ALLOY a-l-l-o-y and it has been kind of a focus in 0:12:58.140,0:13:02.340 undergraduate courses that are looking to start introducing formal methods 0:13:02.340,0:13:07.320 because it does, as output, automatically produce graphical scenarios. 0:13:07.320,0:13:12.060 Some of the other scenario finders will produce text outputs of the scenarios 0:13:12.060,0:13:17.400 and so that's not as approachable. And there's recent work to create - that 0:13:17.400,0:13:24.300 has created this website called Alloy for Fun, and so it is a website you can go to, 0:13:24.300,0:13:30.780 it has Alloy installed and running in its backend and you just type in - it gives you prompts 0:13:30.780,0:13:32.460 to fill in - and then it 0:13:32.460,0:13:38.340 has a back-end oracle it's checking you against and so it'll show you examples of scenarios that 0:13:38.340,0:13:43.320 you allowed that the oracle didn't or vice versa to help correct you, 0:13:43.320,0:13:49.080 so Alloy is something that has been used a bit more more in the educational perspective 0:13:49.080,0:13:51.600 so I would say resources like that would be beneficial.