0:00:06.640,0:00:12.400 So, in this talk I'm going to tell  you the story behind SQLancer, 0:00:12.400,0:00:16.640 which is a tool that we designed and developed to  automatically test database management systems. 0:00:17.520,0:00:22.400 Now I assume that many of you are not directly  working on developing database management systems, 0:00:22.400,0:00:27.040 which is why I will focus on general  insights that you might be able to 0:00:27.040,0:00:31.840 apply on testing your project. So just to ensure that we're all 0:00:31.840,0:00:37.760 on the same page - in our work we wanted to test  database management systems that expect queries 0:00:37.760,0:00:43.760 or statements written in the structured query  language or for short SQL. As perhaps most of 0:00:43.760,0:00:49.040 you know, we can for example use this language to  create tables, to insert data into them, and then 0:00:49.040,0:00:54.880 again fetch data using a select statement. Now in the previous talks we already heard 0:00:54.880,0:01:00.960 a bit about testing, so how can we write  test cases that could test such systems? 0:01:03.120,0:01:08.480 Well, one approach would be to just write them in  SQL. We could for example specify the test case, 0:01:08.480,0:01:14.000 such as here taken from the MySQL test to  it, and then in a separate file specify 0:01:14.000,0:01:20.480 the expected output. And while there are some  frameworks that make this process a bit easier, 0:01:20.480,0:01:25.680 I would argue that it's still always challenging  and time consuming to write manual test cases, 0:01:25.680,0:01:30.400 especially considering for such complex  systems where we need a huge test suites. 0:01:31.600,0:01:37.200 So in our work we ask ourselves, can we automate  the test process? And this brings us to two 0:01:37.200,0:01:42.880 challenges. First of all, if we want to have an  automated testing approach, we need a test case, 0:01:42.880,0:01:47.280 and in our case this means that we need to  generate a database and a query that we can then 0:01:47.280,0:01:53.520 validate. So looking into what's already out  there, we can find tools such as for example 0:01:53.520,0:02:00.080 SQLsmith. SQLsmith is an effective and widely  used brand new query generation tool that mutates 0:02:00.080,0:02:06.000 and generates complex SQL statements that might  cause for example a crash in the database system 0:02:06.000,0:02:12.400 that developers can then fix. So I would say that  this is already a solved challenge, so we can use 0:02:12.400,0:02:16.320 either this random generation approach or  another one to generate the test cases. 0:02:18.160,0:02:23.760 Now what was still an open challenge is the  so-called test oracle problem, namely we want 0:02:23.760,0:02:30.240 to validate the query's result. So what's a test  oracle? A test oracle is basically the mechanism 0:02:30.240,0:02:35.360 that determines whether a test has passed or  failed, and for regression testing for writing 0:02:35.360,0:02:41.120 manual test cases, we typically as the developer  or developers are - or provided the test oracle. 0:02:42.240,0:02:46.480 But this is often difficult, even for manually  written test cases, and I want to illustrate 0:02:46.480,0:02:51.360 this based on a concrete example. You can  see here a test case that actually allowed 0:02:51.360,0:02:58.000 us to find and report a bug in MySQL. You can  see here that we have two tables t0 and t1, 0:02:58.000,0:03:03.040 each of which holds a floating point value,  namely a positive zero and a negative zero. 0:03:03.600,0:03:08.240 And then we instruct MySQL to fetch the cross  product of all records from these two tables 0:03:08.240,0:03:12.320 where the column evaluates to the same value. Now what's the expected output here? 0:03:15.520,0:03:20.160 I would argue that it's disputable whether the  predicate here should evaluate to true or false 0:03:20.880,0:03:25.360 because we could for example look at the binary  representation of these two floating point numbers 0:03:25.360,0:03:29.520 to realize that the sign bit differs. And  based on this it would make sense that the 0:03:29.520,0:03:35.600 equality operator evaluates to false and MySQL  for this case would return an empty result. 0:03:37.520,0:03:43.360 But most programming languages such as Java, C,  and so on, they actually define the semantics that 0:03:43.360,0:03:50.480 this equality operator should evaluate to true  for - for these values. And in this case MySQL 0:03:50.480,0:03:57.360 would fetch a row. So at this point in a talk you  have to trust me, actually MySQL is expected to 0:03:57.360,0:04:04.000 fetch the record. But when we tested the latest  version of it, it still failed to do so. We 0:04:04.000,0:04:09.280 reported this as a bug to the developers who then  fixed it for the next release of MySQL. But the 0:04:09.280,0:04:15.920 point here is actually that we could find this  bug without having an idea on whether a predicate 0:04:15.920,0:04:20.320 should evaluate to true or not and whether  any results or any records should be fetched. 0:04:21.280,0:04:27.920 We basically had this test oracle that told  us that the result returned was correct. 0:04:27.920,0:04:31.760 Now we actually developed a couple of test  oracles, the one I'm going to present today 0:04:31.760,0:04:37.840 is termed logic partitioning or short TLB. And if  I had to explain the approach in half a sentence I 0:04:37.840,0:04:43.040 would basically say, the idea is to test the  database management system against itself. 0:04:44.640,0:04:50.000 The idea is quite generous I want to explain  it based on an analogy. So if we ever meet in 0:04:50.000,0:04:53.920 person it will it's very likely that this will  be in the coffee kitchen because I like to drink 0:04:53.920,0:05:01.120 lots of coffee. And let's also assume that there  is a bowl with fruits containing both tangerines 0:05:01.120,0:05:07.520 and also clementines. Now I'm actually very bad  at telling these different citrus fruits apart, 0:05:07.520,0:05:12.160 they all look the same to me, so this is what  I tell you in order to start some small talk. 0:05:12.960,0:05:17.680 And you might actually respond that you can  clearly tell the difference. So I challenge you to 0:05:17.680,0:05:23.440 show me, but how can I test your understanding of  tangerines versus clementines without even having 0:05:23.440,0:05:30.480 this understanding myself? Well the strategy  that I would apply would be the following.First 0:05:30.480,0:05:33.760 I would ask you to please bring me all  the clementines and you would go ahead 0:05:33.760,0:05:38.480 and fetch perhaps two of the fruits. I  would then put them back into the bowl, 0:05:38.480,0:05:42.800 shuffle them around, and then ask you to  bring me all foods that are not clementines. 0:05:43.840,0:05:48.320 You would bring the apple which I could recognize,  along with the other orange-looking fruits. 0:05:49.520,0:05:53.360 And I believe that already some of you see where  this is going because you brought me two fruits 0:05:54.240,0:05:59.440 first and then you brought me four fruits, so  overall six fruits, but there are only five fruits 0:05:59.440,0:06:05.840 in the bowl, meaning that you likely classified a  fruit as being both a tangerine and a clementine. 0:06:07.040,0:06:11.120 So the high level insight here is that every  object in the universe and also in the bowl 0:06:11.120,0:06:15.840 is either a clementine or not a climate and we  use this insight to basically partition all the 0:06:16.560,0:06:22.320 objects in the bowl. Of course you could always  say - you could always consistently classify 0:06:22.320,0:06:27.440 tangerines as clementines and the other way  around, meaning that we cannot detect all 0:06:27.440,0:06:33.840 the bugs, but neither can test - test cases, so  this is a limitation that we have to live with. 0:06:35.200,0:06:42.480 How do we now apply this to SQL? Now we have any  kind of given predicate P and a row R rather than 0:06:42.480,0:06:47.840 a fruit, but exactly one of the following  must hold the predicate evaluates to true, 0:06:48.400,0:06:51.120 not the predicate or is the true  meaning that it evaluates to false, 0:06:51.760,0:06:56.560 and then since in SQL we also have to deal with  null values, we have to deal with the case where 0:06:56.560,0:07:03.040 P might evaluate to null. And based on this we can  now take any kind of data in our table - any kind 0:07:03.040,0:07:06.720 of intermediate result - and partition it into  three parts: then we do those for the predicate 0:07:06.720,0:07:10.920 where it's true, those where it evaluates to  false, and those where it evaluates to null. 0:07:11.760,0:07:16.160 How did this now allow us to detect  the bug from the motivating example? 0:07:16.160,0:07:22.080 Well, we first generated this query that  basically corresponds to counting the number of 0:07:23.040,0:07:26.880 fruits in the bowl. So we simply fetch the cross  product of all values from these two tables. 0:07:27.680,0:07:34.720 And there MySQL returned a single record. Then  we generated this more complex three queries 0:07:34.720,0:07:39.120 based on this random predicate P. You can see  here that we have the non-negated version, the 0:07:39.120,0:07:44.640 negated version, and this null version. Overall we  expect that this should compute the same result. 0:07:45.760,0:07:51.680 However MySQL returned an empty result set here,  which allowed us to find and report the bug. 0:07:53.520,0:07:56.880 The technique is actually quite versatile  in the sense that we can test multiple 0:07:56.880,0:08:00.240 different kind of SQL features,  here only focused on where clauses. 0:08:02.000,0:08:06.960 So we basically had this random generation  approach to tackle this test case generation 0:08:06.960,0:08:14.240 problem, and now we propose ternary logic  partitioning to tackle this test oracle problem. 0:08:15.680,0:08:20.800 Can such a simple technique be effective is  the next natural question? well I used SQL 0:08:20.800,0:08:28.160 answer TLB being one of these approaches that we  proposed to find over 450 unique and previously 0:08:28.160,0:08:32.400 unknown bugs in Baidu's database management  systems that you have all heard of or know. 0:08:34.560,0:08:38.320 And what should you take away from  this talk? Well, while the specific 0:08:38.320,0:08:43.440 technique data partitioning primarily works for  data-oriented systems, it is based actually on 0:08:43.440,0:08:52.560 a more generic - more general kind of technique.  Namely if we look at this from an abstract level, 0:08:52.560,0:08:57.360 we basically can realize that we had a test  case that we executed to obtain a result 0:08:58.240,0:09:03.120 and based on this we derived a new test case  for which we could then provide a test oracle. 0:09:04.080,0:09:11.760 And this does metamorphic testing. Now what you  can do is, you could try of course to come up with 0:09:11.760,0:09:16.960 a metamorphic testing technique for yourself  but an alternative would also be to check if 0:09:16.960,0:09:22.560 there are any already existing approaches,  for example by looking on Google Scholar. 0:09:25.360,0:09:28.000 And with that, I'll summarize. And the takeaways, 0:09:28.720,0:09:32.880 so manually writing test cases is time  intensive and requires detailed domain 0:09:32.880,0:09:36.160 knowledge and you might actually  face a similar problem in your work. 0:09:37.040,0:09:43.120 So what we propose is to couple random test case  generation with metamorphic testing which turned 0:09:43.120,0:09:49.840 out to effectively complement manually written  test cases. With that, thank you for listening.