0:00:09.360,0:00:13.320 So good morning everyone I'm Joanna, I'm an assistant professor in the department of 0:00:13.320,0:00:16.620 computer science and engineering at Notre Dame. I actually just joined last year that's why 0:00:16.620,0:00:21.480 Brittany is confused with my affiliation. And today I'd like to talk about a little bit on code smells 0:00:21.480,0:00:27.180 and automatically generated code. So remember back in 2015, 2016 I was a Master's student 0:00:27.180,0:00:32.400 and I saw this tweet online which basically says, hey, please, machine, please make a website with my 0:00:32.400,0:00:38.160 favorite fonts. Amazing, amazing, big performance, and of course, please, no bugs right that's great, 0:00:38.160,0:00:43.620 this would be a an amazing programming language. But of course that was a joke back then, but 0:00:43.620,0:00:48.900 the reality is that, is it really a joke nowadays? I don't think so. Because last year you probably 0:00:48.900,0:00:53.040 saw the release of GitHub Copilot in which now you can just write your function signature, 0:00:53.040,0:00:57.420 you know, just write your code comments, and then you say what you call - you want your code to be. 0:00:57.420,0:01:02.100 And GitHub Copilot will generate a few recommendations for you. And that's awesome, 0:01:02.100,0:01:06.480 it's really great, it really helps you a lot and improve your - is going to improve your 0:01:06.480,0:01:12.660 productivity. And previous work have been focusing a lot on the functionality of that code, of that 0:01:12.660,0:01:17.760 generated code, so meaning that the code will do what it's supposed to be doing, the functionality, 0:01:17.760,0:01:24.600 but how about the quality of the generated code? Is the code correct, but is it also free of code 0:01:24.600,0:01:31.560 smells, is it also free of security flaws, so that's really not clear. So right off the bat me and my 0:01:31.560,0:01:36.720 research - my PhD students - we have been looking at code smells in this ultimately automatically 0:01:36.720,0:01:42.120 generated code. And code smells as you already probably know are just basically symptoms that 0:01:42.120,0:01:47.040 may indicate that this system has flaws. And these code smells can generate maintainability 0:01:47.040,0:01:53.520 problems, technical debts, and also security issues over time, which we refer to as security smells. 0:01:54.600,0:02:01.320 So in light of that gap, what we have done is, well, first of all, let's take a look at 0:02:01.320,0:02:07.260 the training sets that are used to train these machine learning models that generate code. So 0:02:07.260,0:02:13.260 for that reason we looked at three different data sets that are commonly used for - by these machine 0:02:13.260,0:02:19.920 learning models, and then we we say, like, the Python samples from those data sets, and we use two static 0:02:19.920,0:02:25.980 analysis tools which was Pylint and Bandit. And the whole idea here is to see, do these training 0:02:25.980,0:02:31.200 sets contain code smells or not? and the answer to this question is actually yes they do 0:02:31.200,0:02:35.580 contain code smells, and in fact you as you can see in this table right here, you can see that 0:02:35.580,0:02:43.860 CodeXGlue which is one very large data set 97% of those Python samples did had code smells that were 0:02:43.860,0:02:49.380 reported by those two static analyzers. And if you look in terms of security smells they - these 0:02:49.380,0:02:54.780 code snippets also have security smells as well, such as for example, Code Clippy has 10% of 0:02:54.780,0:03:02.280 its Python samples with security smells. So that's quite alarming but in the end of the day these are 0:03:02.280,0:03:08.040 just the training sets, it doesn't necessarily mean that the generated code will have those issues. 0:03:08.040,0:03:14.640 So in light of that we also investigated if there are code smells in the generated code. 0:03:14.640,0:03:22.080 So we follow the systematic process that basically we use GitHub Copilot, we had a list of prompts, we 0:03:22.080,0:03:29.100 gave to GitHub Copilot and we also gave the same prompts, the same comments, to Code Clippy which is an 0:03:29.100,0:03:34.560 open source version of GitHub Copilot, and then we also run the static analyzers Pylint and 0:03:34.560,0:03:42.000 Bandit to see if the generated code has code smells or not. So after we have done that what we 0:03:42.000,0:03:47.700 have noticed indeed, we did find code smells and security smells in also the automatically generated 0:03:47.700,0:03:53.400 code. So for example in terms of code smells we found undefined variables as the most problematic 0:03:53.400,0:04:00.840 code smells, lines too long, duplicated code as well, unused arguments, and more importantly we 0:04:00.840,0:04:06.540 also find security problems that can be quite severe and can make your system insecure, such 0:04:06.540,0:04:12.360 as the use of the eval function - that means that you can execute code and an attacker might exploit 0:04:12.360,0:04:18.540 that to remotely execute code - and also the use of weak hash functions as well such as md5. 0:04:19.620,0:04:24.780 So here is an example of what we found. So for example here I have a function in Python that 0:04:24.780,0:04:30.720 says, show users, and I have a code comment that says, hey, I would like to query my database and 0:04:30.720,0:04:35.580 get user information given a username. That's pretty cool and that's what they call that 0:04:35.580,0:04:42.300 GitHub Copilot generated. As you can see in line 10, we have here a cursor designation of that SQL 0:04:42.300,0:04:49.260 command which seems to be correct and indeed it is correct, but this is prone to SQL injection. 0:04:49.260,0:04:53.760 If you do have any sort of security knowledge you know already that username you could inject some 0:04:53.760,0:04:59.460 SQL code in there, and you're able to, for example, drop the database completely. So this is going to 0:04:59.460,0:05:04.980 introduce SQL injection, right. Another issue we're having here is that the credentials in this code 0:05:04.980,0:05:12.720 snippet at line 6 for example are also hard coded which is another security smell as well. So how 0:05:12.720,0:05:17.880 can you - what can you do about it now that you know that, it is great to use these AI based tools but 0:05:17.880,0:05:24.240 with great powers comes great responsibilities, right, so what can you do about it? I think the 0:05:24.240,0:05:27.840 whole point here is that, you know, that's the case I would like to make to you as a developer, is 0:05:28.680,0:05:33.720 please use these AI based tools, they are awesome, they will help you with your productivity, but 0:05:33.720,0:05:38.760 also take - take with a grain of salt whatever they are generating for you. Make big use 0:05:38.760,0:05:43.020 of linters - Pylint, Bandit, or whatever else that is available for the language that 0:05:43.020,0:05:50.880 you're using, because at the end of the day, why should you care? And the reason is well, exactly, 0:05:53.340,0:05:57.780 yeah, you're right, so you already know that's what I want to say. Because at the end of the 0:05:57.780,0:06:02.040 day that code that has been pushed to the repo, when you do "git blame" it's not going to say that 0:06:02.040,0:06:05.940 the blame is on GitHub Copilot, it's going to say that the blame is on you - you are the person that 0:06:05.940,0:06:10.560 posted that code, it was your responsibility to make sure that it was free of vulnerabilities 0:06:10.560,0:06:16.020 and free of code smells. Okay, so with that being said, I would like to thank you for your attention, 0:06:16.020,0:06:20.220 and if you want to know more please scan the QR code and you can read the paper. Thank you so much.