0:00:09.360,0:00:13.080
Thank you for inviting me and thank you everyone for joining and for attending the

0:00:13.080,0:00:19.500
talks. So I'm Foutse Khom from the Polytechnique in Montreal well with my team we have been focusing

0:00:19.500,0:00:24.000
recently on quality assurance of machine learning enabled systems, right, so the goal

0:00:24.000,0:00:30.300
of the research we're trying to do is to allow you guys and the community to design and build

0:00:30.300,0:00:36.780
reliable machine learning based systems. And we all know that with this new specificity of the

0:00:36.780,0:00:41.880
system, with the dependency of the data, there are much more issues that comes up with the

0:00:42.540,0:00:49.140
nature, so we are trying to tackle those issues and issues related to data quality issues

0:00:49.140,0:00:55.380
on the specifications, implementations, and so on. So today I will give you a glimpse of some

0:00:55.380,0:00:59.640
of the things we have been doing. So machine learning aids have been ready for prime time

0:00:59.640,0:01:04.320
so I've been already deploying many systems. I'm pretty sure most of us

0:01:04.320,0:01:12.960
have been using some of those systems but they need to be reliable. And typically to

0:01:12.960,0:01:17.220
produce the model that we embed in the systems we actually do produce a lot of codes, right, so

0:01:17.220,0:01:24.420
this is a typical pipeline for a deep learning systems where you have to produce code to fetch

0:01:24.420,0:01:29.940
your data, to automate the learning process, and then you have to train you have to test and validate

0:01:29.940,0:01:36.120
until you can deploy. So all through this process we actually writing codes to actually generate

0:01:36.120,0:01:41.640
models and then as any type of programs they often fails and they fail a bit differently

0:01:41.640,0:01:48.300
from what we do in - what we see in the traditional programs. So contrary to traditional programs the

0:01:48.300,0:01:53.880
space of failures for those system is a bit more broader, right, so you can have issues related to

0:01:54.600,0:01:59.700
modeling process, so you can have a lot of under-specification issues, and then we also have

0:01:59.700,0:02:04.440
implementation issues because we have to actually script all the code that actually automate the

0:02:04.440,0:02:09.660
learning process. And we have many issues with data quality, right, so we need to actually be able to

0:02:09.660,0:02:14.760
detect those issues. And what I'm trying to share with you is some of the tools that we have been

0:02:14.760,0:02:20.160
building to help with this, because we believe that only automation can actually help going through

0:02:20.160,0:02:24.420
this process, right - if you have been trying to play with some model you can know that debugging

0:02:24.420,0:02:28.740
a machine learning pipeline can be actually very tricky because sometimes the difference between a

0:02:28.740,0:02:34.320
state-of-the-art model and a very poor performing model can be as simple as the learning rates, right.

0:02:34.320,0:02:38.760
So it's very difficult to actually find those knobs when you try to trick the model so we try to

0:02:38.760,0:02:46.080
automate this. So what do we do? So we in this talk I will talk about two approach that we propose.

0:02:46.080,0:02:50.580
One is based on static analysis so we brought static analysis to the problem.

0:02:51.600,0:02:57.240
Why we thought about this, because then we all know that it's quick, right, it's a bit cheaper and it

0:02:57.240,0:03:02.280
can be very effective if it's done correctly upfront. So we have a tool that we built which

0:03:02.280,0:03:08.460
is Neural Lens which is actually pretty effective and I actually encourage you to try it out, right.

0:03:08.460,0:03:12.780
So the tool can actually have been tested so you have the detail in the paper - I won't talk about the

0:03:12.780,0:03:18.240
detail in the paper - but what I can say, is more how the tool works. So how did we build the tools?

0:03:18.240,0:03:23.880
So the tool rely on two key components, so we have a meta model of the learning program that we had

0:03:23.880,0:03:30.120
to build, right, and then we also had a taxonomy of typical faults in a deep learning program. So

0:03:30.120,0:03:35.280
in the tool we basically automate the detection of this faults based on the representation of a

0:03:35.280,0:03:41.460
deep learning program that we built, based on this meta model, right. Simple. Okay so the workflow

0:03:41.460,0:03:45.720
looks like this - so if you have a program and then you want to use our tool, basically what

0:03:45.720,0:03:50.220
we do is, we extract from your program all the different features and components that we need to

0:03:50.220,0:03:54.120
comply with the specification of the meta model, then we built a representation of your program,

0:03:54.720,0:03:59.940
and then based on our set of rules that we actually specify to detect the different issues,

0:03:59.940,0:04:07.200
we can actually run detections on the program and provide a set of checks. And it works and it's

0:04:07.200,0:04:12.540
pretty fast, okay, but the problem with this is that with static analysis we can't really tackle

0:04:12.540,0:04:17.220
the interaction with the data and the dynamicity that's in the process, right. So to help with

0:04:17.220,0:04:24.720
that aspect we try something else that we all do: dynamic analysis. So we decided to actually

0:04:25.800,0:04:30.960
explore - what do you do - we decide to inspect the training process of the program and extract

0:04:30.960,0:04:35.100
information about the behavior of the program during the training process, and based on this

0:04:35.100,0:04:39.000
information we could actually check certain specific property that could be a signal of problem

0:04:39.000,0:04:45.300
with the training model, right. So we have this approach, which is also pretty effective - we

0:04:45.300,0:04:50.460
have been compared actually with search meta which does a pretty similar thing, and the good news is

0:04:50.460,0:04:55.800
that this approach does a bit better, so I actually encourage you to try it out. It can detect 30% more

0:04:55.800,0:05:00.900
backs than actually search meta we were kind of forced to compare this for the paper, otherwise

0:05:00.900,0:05:06.120
they wouldn't accept the paper, so what do some of the rules that we actually rely on looks like?

0:05:06.120,0:05:11.760
So we - the tool implements a variety of checks, right, so some of the checks can be as simple as

0:05:11.760,0:05:17.940
checking for parameter related issues or more complex optimization related issues, right, so an

0:05:17.940,0:05:21.900
example of parameter related issue you can check untrained parameters and this is very easy to

0:05:21.900,0:05:25.320
change right so you can extract information during the training process and then just

0:05:25.320,0:05:31.740
perform some verifications, right, some comprising. And then an example of activation related issues

0:05:31.740,0:05:36.300
so we can check the ranges right so this is a common if you have been trying deep learning

0:05:36.300,0:05:40.440
problem you can you know that this is something that happened sometime very frequently and then

0:05:40.440,0:05:45.960
the two can actually report this for you pretty easily right. We also have a lot of checks related

0:05:45.960,0:05:51.900
to optimization problems, so you can check if you are fitting the - the data sample for a while if you

0:05:51.900,0:05:57.240
are having any vanishing gradients if you have an unstable gradient and so on. So all the checks have

0:05:57.240,0:06:01.920
been implemented in the tool I strongly encourage you to try out and the flow is very simple so

0:06:01.920,0:06:06.480
there is a small overhead that comes with using the tool because we're actually instrumenting your

0:06:06.480,0:06:09.720
process so we're extracting information during the training process so that is on overhead that

0:06:09.720,0:06:14.100
comes with that, but through the experimentation and validation that we did the average is actually

0:06:14.700,0:06:23.880
stretchable manageable. Okay, so try the tools and that is it for me I guess.

0:06:23.880,0:06:30.000
So I wanted to raise your attention about failures occurring frequently in those systems

0:06:30.000,0:06:34.980
and the fact that the space of failure in this system is actually pretty large compared to

0:06:34.980,0:06:39.900
traditional systems, and that we actually need automation to navigate this space. And hopefully

0:06:39.900,0:06:45.420
these tools that we're actually building and releasing will actually help us to avoid those

0:06:45.420,0:06:51.060
pitfalls and maybe stay out of the float I think this is from Mike. So any questions?

0:06:54.840,0:06:55.800
So that's it.