0:00:09.360,0:00:13.080 Thank you for inviting me and thank you everyone for joining and for attending the 0:00:13.080,0:00:19.500 talks. So I'm Foutse Khom from the Polytechnique in Montreal well with my team we have been focusing 0:00:19.500,0:00:24.000 recently on quality assurance of machine learning enabled systems, right, so the goal 0:00:24.000,0:00:30.300 of the research we're trying to do is to allow you guys and the community to design and build 0:00:30.300,0:00:36.780 reliable machine learning based systems. And we all know that with this new specificity of the 0:00:36.780,0:00:41.880 system, with the dependency of the data, there are much more issues that comes up with the 0:00:42.540,0:00:49.140 nature, so we are trying to tackle those issues and issues related to data quality issues 0:00:49.140,0:00:55.380 on the specifications, implementations, and so on. So today I will give you a glimpse of some 0:00:55.380,0:00:59.640 of the things we have been doing. So machine learning aids have been ready for prime time 0:00:59.640,0:01:04.320 so I've been already deploying many systems. I'm pretty sure most of us 0:01:04.320,0:01:12.960 have been using some of those systems but they need to be reliable. And typically to 0:01:12.960,0:01:17.220 produce the model that we embed in the systems we actually do produce a lot of codes, right, so 0:01:17.220,0:01:24.420 this is a typical pipeline for a deep learning systems where you have to produce code to fetch 0:01:24.420,0:01:29.940 your data, to automate the learning process, and then you have to train you have to test and validate 0:01:29.940,0:01:36.120 until you can deploy. So all through this process we actually writing codes to actually generate 0:01:36.120,0:01:41.640 models and then as any type of programs they often fails and they fail a bit differently 0:01:41.640,0:01:48.300 from what we do in - what we see in the traditional programs. So contrary to traditional programs the 0:01:48.300,0:01:53.880 space of failures for those system is a bit more broader, right, so you can have issues related to 0:01:54.600,0:01:59.700 modeling process, so you can have a lot of under-specification issues, and then we also have 0:01:59.700,0:02:04.440 implementation issues because we have to actually script all the code that actually automate the 0:02:04.440,0:02:09.660 learning process. And we have many issues with data quality, right, so we need to actually be able to 0:02:09.660,0:02:14.760 detect those issues. And what I'm trying to share with you is some of the tools that we have been 0:02:14.760,0:02:20.160 building to help with this, because we believe that only automation can actually help going through 0:02:20.160,0:02:24.420 this process, right - if you have been trying to play with some model you can know that debugging 0:02:24.420,0:02:28.740 a machine learning pipeline can be actually very tricky because sometimes the difference between a 0:02:28.740,0:02:34.320 state-of-the-art model and a very poor performing model can be as simple as the learning rates, right. 0:02:34.320,0:02:38.760 So it's very difficult to actually find those knobs when you try to trick the model so we try to 0:02:38.760,0:02:46.080 automate this. So what do we do? So we in this talk I will talk about two approach that we propose. 0:02:46.080,0:02:50.580 One is based on static analysis so we brought static analysis to the problem. 0:02:51.600,0:02:57.240 Why we thought about this, because then we all know that it's quick, right, it's a bit cheaper and it 0:02:57.240,0:03:02.280 can be very effective if it's done correctly upfront. So we have a tool that we built which 0:03:02.280,0:03:08.460 is Neural Lens which is actually pretty effective and I actually encourage you to try it out, right. 0:03:08.460,0:03:12.780 So the tool can actually have been tested so you have the detail in the paper - I won't talk about the 0:03:12.780,0:03:18.240 detail in the paper - but what I can say, is more how the tool works. So how did we build the tools? 0:03:18.240,0:03:23.880 So the tool rely on two key components, so we have a meta model of the learning program that we had 0:03:23.880,0:03:30.120 to build, right, and then we also had a taxonomy of typical faults in a deep learning program. So 0:03:30.120,0:03:35.280 in the tool we basically automate the detection of this faults based on the representation of a 0:03:35.280,0:03:41.460 deep learning program that we built, based on this meta model, right. Simple. Okay so the workflow 0:03:41.460,0:03:45.720 looks like this - so if you have a program and then you want to use our tool, basically what 0:03:45.720,0:03:50.220 we do is, we extract from your program all the different features and components that we need to 0:03:50.220,0:03:54.120 comply with the specification of the meta model, then we built a representation of your program, 0:03:54.720,0:03:59.940 and then based on our set of rules that we actually specify to detect the different issues, 0:03:59.940,0:04:07.200 we can actually run detections on the program and provide a set of checks. And it works and it's 0:04:07.200,0:04:12.540 pretty fast, okay, but the problem with this is that with static analysis we can't really tackle 0:04:12.540,0:04:17.220 the interaction with the data and the dynamicity that's in the process, right. So to help with 0:04:17.220,0:04:24.720 that aspect we try something else that we all do: dynamic analysis. So we decided to actually 0:04:25.800,0:04:30.960 explore - what do you do - we decide to inspect the training process of the program and extract 0:04:30.960,0:04:35.100 information about the behavior of the program during the training process, and based on this 0:04:35.100,0:04:39.000 information we could actually check certain specific property that could be a signal of problem 0:04:39.000,0:04:45.300 with the training model, right. So we have this approach, which is also pretty effective - we 0:04:45.300,0:04:50.460 have been compared actually with search meta which does a pretty similar thing, and the good news is 0:04:50.460,0:04:55.800 that this approach does a bit better, so I actually encourage you to try it out. It can detect 30% more 0:04:55.800,0:05:00.900 backs than actually search meta we were kind of forced to compare this for the paper, otherwise 0:05:00.900,0:05:06.120 they wouldn't accept the paper, so what do some of the rules that we actually rely on looks like? 0:05:06.120,0:05:11.760 So we - the tool implements a variety of checks, right, so some of the checks can be as simple as 0:05:11.760,0:05:17.940 checking for parameter related issues or more complex optimization related issues, right, so an 0:05:17.940,0:05:21.900 example of parameter related issue you can check untrained parameters and this is very easy to 0:05:21.900,0:05:25.320 change right so you can extract information during the training process and then just 0:05:25.320,0:05:31.740 perform some verifications, right, some comprising. And then an example of activation related issues 0:05:31.740,0:05:36.300 so we can check the ranges right so this is a common if you have been trying deep learning 0:05:36.300,0:05:40.440 problem you can you know that this is something that happened sometime very frequently and then 0:05:40.440,0:05:45.960 the two can actually report this for you pretty easily right. We also have a lot of checks related 0:05:45.960,0:05:51.900 to optimization problems, so you can check if you are fitting the - the data sample for a while if you 0:05:51.900,0:05:57.240 are having any vanishing gradients if you have an unstable gradient and so on. So all the checks have 0:05:57.240,0:06:01.920 been implemented in the tool I strongly encourage you to try out and the flow is very simple so 0:06:01.920,0:06:06.480 there is a small overhead that comes with using the tool because we're actually instrumenting your 0:06:06.480,0:06:09.720 process so we're extracting information during the training process so that is on overhead that 0:06:09.720,0:06:14.100 comes with that, but through the experimentation and validation that we did the average is actually 0:06:14.700,0:06:23.880 stretchable manageable. Okay, so try the tools and that is it for me I guess. 0:06:23.880,0:06:30.000 So I wanted to raise your attention about failures occurring frequently in those systems 0:06:30.000,0:06:34.980 and the fact that the space of failure in this system is actually pretty large compared to 0:06:34.980,0:06:39.900 traditional systems, and that we actually need automation to navigate this space. And hopefully 0:06:39.900,0:06:45.420 these tools that we're actually building and releasing will actually help us to avoid those 0:06:45.420,0:06:51.060 pitfalls and maybe stay out of the float I think this is from Mike. So any questions? 0:06:54.840,0:06:55.800 So that's it.