0:00:02.520,0:00:06.900 I'm Sherlock Licorish, I am coming from New Zealand, 0:00:06.900,0:00:13.380 and today I will actually answer the question, can genetic improvement enhance 0:00:13.380,0:00:17.160 online code snippets? So the basis for the talk I 0:00:17.160,0:00:25.320 suppose is in some regard related to problems earlier on assessment around the use of code. 0:00:26.760,0:00:30.120 In this context it was language - large language models, but 0:00:30.120,0:00:34.800 equally we sort of reuse code really nearly in today's world. 0:00:35.520,0:00:40.920 Again touching on what Raula was just saying as well With some inherent trust. 0:00:40.920,0:00:51.480 So we sort of look at to develop a community by improvement opportunities potentially. 0:00:52.320,0:00:56.940 So the situation is this. We've got heaps of code online beyond the 0:00:58.200,0:01:05.160 Q&A sites now we've got quite a few automatic generation of code happening and - 0:01:05.160,0:01:10.860 and as a result of that of course we're reusing it and that is also established - fully established 0:01:10.860,0:01:15.360 in the academic community where we can see reuse not only 0:01:16.680,0:01:22.080 across open source projects and so on but equally even amongst the developers on 0:01:22.080,0:01:25.740 these sites - the contributors of these sites. So how do we actually, 0:01:26.520,0:01:31.500 sort of, take this as a good opportunity to sort of help the community somehow? 0:01:32.340,0:01:39.960 So the challenge is, snippets can be incomplete and of course bug prone and equally 0:01:41.520,0:01:46.620 we've observed errors not only in Stack Overflow but many of 0:01:46.620,0:01:51.720 the platforms quite - quite significant errors. These observations extend students' work - so 0:01:51.720,0:01:57.780 oftentimes students would copy code from online unaware of - of the bugs in in code 0:01:57.780,0:02:04.020 and I suppose present those in a - in assignments and so on and other examples they submit. 0:02:04.980,0:02:10.980 But errors also are prevalent in the open source community, amongst proprietary developers, 0:02:10.980,0:02:16.560 and end users themselves report errors. A popular error there is highlighted the 0:02:16.560,0:02:23.460 Nissan Connect EV where an end user actually was able to see code copied from Stack Overflow 0:02:23.460,0:02:28.440 with comments from that code actually visible in the Connect system. 0:02:28.440,0:02:33.420 So snippets are reused and they're available everywhere. 0:02:34.020,0:02:36.840 So this is a challenge for the community, 0:02:36.840,0:02:40.080 and the academic community is particularly aware of it, 0:02:40.620,0:02:46.440 and so we've actually been researching these portals to see what are the 0:02:46.440,0:02:50.580 issues that are prevalent there and how might we, I suppose, 0:02:50.580,0:02:57.300 support the community with solutions. So one specific study actually looked at security 0:02:58.260,0:03:04.800 and found the code copied from online to be reasonably insecure relative - 0:03:04.800,0:03:09.600 the code that is actually in the code system, that isn't actually using copied code from online. 0:03:10.320,0:03:18.420 And equally, code actually copied online was, sort of, accessed for the effort that is needed 0:03:18.420,0:03:20.520 to make it usable and at times 0:03:21.180,0:03:25.320 in the effort in the - in the, sort of, it is to sort of get caught and get a solution, 0:03:25.320,0:03:31.200 sometimes developers can take more time trying to perfect that goal to make it palatable. 0:03:32.820,0:03:37.440 But equally, code has been assessed for cohesion and coupling and all that, 0:03:37.440,0:03:43.200 where it has been proven to be at times not as good as one would expect it to be. 0:03:43.200,0:03:49.140 So the community is quite aware that there are issues with code online 0:03:49.740,0:03:56.580 but we still use these snippets and I think it would be hard for 0:03:56.580,0:04:01.560 anyone to claim themselves to be without sin in relation to the reuse of snippets from online. 0:04:01.560,0:04:07.260 So we sort of use these snippets anyway, the question is, of course, not "if" - we 0:04:07.260,0:04:12.840 know it will happen, that we will use it. So the current state of play as such, 0:04:12.840,0:04:19.440 that recent research shows that much of the code that is actually available 0:04:19.440,0:04:25.080 online is not necessarily generated or provided there for reuse as we do so. 0:04:25.080,0:04:31.080 Typically, code online so that we can see that, it is actually given pretty much for us to 0:04:31.080,0:04:35.340 actually, sort of, extend and perfect and to - and sort of provide it - 0:04:35.340,0:04:39.780 provide our solutions in a way that is not necessarily intended just for copy and reuse. 0:04:39.780,0:04:44.820 So the thing is, the developers online and the people 0:04:44.820,0:04:48.780 who are providing these contributions are not necessarily to be blamed from that perspective, 0:04:48.780,0:04:52.740 largely because they don't expect that we will reuse the code that is available online. 0:04:53.820,0:04:59.100 So they do provide quite a bit of it. So they typically will provide at least 0:04:59.100,0:05:05.880 two to three snippets for each piece of - for each solution that is provided on a Q&A site, 0:05:05.880,0:05:09.900 whether it's code or Stack Overflow, there's abundance of code for every 0:05:09.900,0:05:12.000 question that is answered. And equally, 0:05:12.960,0:05:19.020 snippets will be under 100 lines of code, suggesting again that the - the community 0:05:19.020,0:05:24.840 there isn't necessarily providing code wholesale to actually be wholesome 0:05:24.840,0:05:28.500 and to solve all the solutions, but their is an abundance of issues 0:05:29.700,0:05:35.280 beyond readability and reliability issues, there are also performance and security issues, 0:05:35.280,0:05:41.760 and I suspect these large language models that are trained on online code potentially 0:05:41.760,0:05:46.080 inherit many of these solutions as a firm was just alluded to. 0:05:46.080,0:05:50.160 So there is - there are errors they're quite a bit of errors online. 0:05:50.820,0:05:57.180 So we've been actually battling with this question of, how can we, sort of, 0:05:57.180,0:06:00.420 help with the improvement effort. So we know we're going to use it, 0:06:00.420,0:06:04.620 we know there are errors, can we, sort of, support the community somehow. 0:06:04.620,0:06:06.960 So we've sort of tried with these questions 0:06:07.800,0:06:13.620 for a couple of years now and - and me and my colleagues, particularly in 0:06:13.620,0:06:18.000 Australia, decided, how can we, sort of, see how we might actually, 0:06:20.100,0:06:25.500 sort of, use GI improvement techniques, to have a look to see how we might actually 0:06:25.500,0:06:30.600 large scale help with this improvement level. So we've got a preliminary agenda here, 0:06:30.600,0:06:34.320 where we had about 8000 snippets from Stack Overflow. 0:06:35.460,0:06:39.840 This repository was used for other things so we sort of had some Java code 0:06:39.840,0:06:47.400 and we we run it through a static checker - PMD - so I'm sure you all know what PMD is, 0:06:47.400,0:06:53.100 but just briefly, it's sort of a static analyzer that checks for anti-patterns 0:06:53.100,0:06:59.100 especially around various sorts of points from more readability related ones to security, 0:07:00.540,0:07:05.040 and then we use GIN, and GIN is a genetic improvement framework, 0:07:05.040,0:07:08.580 and genetic improvement essentially is nature-inspired computing. 0:07:08.580,0:07:18.360 So it sort of looks at evolving a code space in a way that we might be able to sort of find the 0:07:18.360,0:07:23.100 optimal solution to solve a problem. So in a nutshell we sort of mutate 0:07:23.100,0:07:29.760 the code using some criteria that actually assess subsequent to the mutation of fitness 0:07:29.760,0:07:34.140 and we've sort of promote more of the, sort of, stronger ones, 0:07:34.140,0:07:36.960 the one that passes the fitness test against the one that doesn't. 0:07:36.960,0:07:42.240 So we sort of assess in the first instance here performance related issues. 0:07:45.780,0:07:51.360 So we try to actually run PMD first and we found quite a few errors, 0:07:52.680,0:07:58.080 over 30,000 in this instance, of violations. Of course, as we said, these code snippets 0:07:58.080,0:08:03.240 weren't necessarily copied - weren't necessarily provided to be wholesome, 0:08:03.240,0:08:08.760 so we understand that these violations were given - it's - it should be allowed, in a way. 0:08:09.600,0:08:16.200 We found 135 of the rules violated, and we singled out performance related rules, 0:08:16.200,0:08:23.160 so stuff related to the use of strings and so on. And then we actually run the - the GIN random 0:08:23.160,0:08:26.760 sampler and we mutate eight different types of mutations, 0:08:27.420,0:08:35.400 and we actually haven't done that from 770 patches that no longer had any performance issues. 0:08:36.360,0:08:40.440 So essentially the mutations actually solve these issues, 0:08:40.440,0:08:45.000 and of those patches, 58 actually had compilable code, 0:08:45.000,0:08:48.060 so we can actually compile the code and use the code. 0:08:49.800,0:08:53.460 So we sort of check to see what was the sort of nature of the 0:08:53.460,0:08:58.500 mutation that resulted in wholesome code and largely they were related mutations, 0:08:58.500,0:09:05.100 but there are also a bit of other mutations that actually worked in - in particular, however, 0:09:05.100,0:09:13.620 the ones that actually, sort of, resulted in most of the, sort of, solved issues were need related. 0:09:13.620,0:09:21.840 There's some copyrighted ones as well. So this was encouraging in terms of what we found, 0:09:21.840,0:09:26.100 but equally we found quite a few issues as a result of this as well. 0:09:26.100,0:09:32.520 So we had of course false positives to deal with, and - and of course that meant that sometimes 0:09:32.520,0:09:38.100 errors were reported that really errors. We needed to improve parsing and then of course 0:09:38.100,0:09:41.100 crowdsource rules, we were shortened rules. 0:09:41.760,0:09:47.280 Equally, we - we believe that we needed better sampling, better code sampling, 0:09:47.280,0:09:50.940 and of course non-functional properties we could not have actually detected. 0:09:51.480,0:09:55.680 Now I must caution that GIN is typically used with test cases but we didn't, 0:09:55.680,0:10:00.060 so we run PMD, we run GIN, and then we run PMD again, 0:10:00.060,0:10:03.780 so in some regard this - our result may be inflated, 0:10:03.780,0:10:09.540 in that if we may have run some of the tests unit tests maybe some of the code may have failed. 0:10:10.380,0:10:17.100 So we've got this published in this paper where we sort of followed up to look more in more 0:10:17.100,0:10:19.920 detail at some of the mutations and so we sort of provide this 0:10:19.920,0:10:25.680 there if you wanted to follow up on that. Thanks - I should extend thanks to the team, 0:10:26.400,0:10:30.240 but beyond that too from the funding sources that supported the work, 0:10:30.240,0:10:34.500 and also for the opportunity to present here, Greg and team. 0:10:35.160,0:10:39.240 So I can probably repeat, sorry, revisit the question here, 0:10:40.380,0:10:43.440 and in some regard I might say as opposed as a statement, 0:10:43.440,0:10:46.680 I think genetic improvement might help to improve code. 0:10:47.280,0:10:52.020 Thanks very much, happy to take your thoughts. All right thank you very much, Sherlock, 0:10:52.920,0:10:55.080 again questions coming in from our viewers, 0:10:55.080,0:10:58.740 and the first one harks back to a question asked earlier: 0:10:59.940,0:11:06.780 who gets the credit for writing the code that's produced by these sorts of genetic algorithms? 0:11:08.940,0:11:14.760 So - so the genetic algorithm in this context is used to improve the code 0:11:15.480,0:11:22.800 I suppose if we sort of look at Copilot or one of the - the large language models, 0:11:23.700,0:11:29.280 in reality I think we don't have legislation to, sort of, really police and to, sort of, 0:11:29.280,0:11:35.520 look after this paradigm as we should. I think with time we're going to get there, 0:11:35.520,0:11:41.520 but I think in the context of genetic improvement - genetic 0:11:41.520,0:11:48.000 improvement is just providing a framework with which we might perform code experiments. 0:11:48.000,0:11:54.660 Here the mutations are done to improve the code - code is online, and it's actually of course 0:11:54.660,0:11:58.440 open for use in the case of Stack Overflow, for instance, 0:11:58.440,0:12:05.820 it's given - the API - the SQL engine is given for us to query as it is. 0:12:05.820,0:12:10.260 Whether or not that should be allowed by the community - at the moment I suppose isn't quite 0:12:10.260,0:12:15.420 mature and ready to deal with that, so in the - in the current instance 0:12:15.420,0:12:20.280 we use it as it's available. Okay and another question is, 0:12:21.000,0:12:26.760 are these genetic algorithms fast enough that they can be used in real time to suggest code 0:12:26.760,0:12:30.300 improvements or code alterations as the code is being developed? 0:12:30.840,0:12:33.000 That's a very good question. So I think 0:12:34.620,0:12:39.960 for the most part the more extensive frameworks will need computing resources, 0:12:39.960,0:12:46.020 so it's not going to be easy for someone in an environment with low resource computers 0:12:46.020,0:12:49.740 to actually use these frameworks. The thing is, however, I think in 0:12:49.740,0:12:54.180 most contexts developers have got the sort of the framework or the environment 0:12:54.180,0:12:57.000 that will allow us to use these algorithms with ease, 0:12:58.140,0:13:02.640 and more and more they are designed for optimization such that if it is possible 0:13:02.640,0:13:08.580 that we would be able to leverage them. But as with everything else it's, sort of, 0:13:08.580,0:13:12.780 there's no one size fits all, I suppose, it's about having an assessment of your 0:13:12.780,0:13:16.980 reality and then of course retrofit in a solution to suit your case.