Vision Cognitive Services Updates

Vision Cognitive Services Updates



>> Get caught up on all
the new features inside of Computer and Custom Vision in this episode of the AI Show where Kelly takes us through all
of the awesome features, make sure you take a look. Hello and welcome to
this episode of the AI Show. We're going to talk
about Cognitive Services specifically
vision services. I got my friend Kelly. How
are you doing my friend? >> Pretty good.
How are you doing? >> Fantastic. So, tell us what you do, at
Cognitive Services. >> So, we work on the vision
specifically and we look at how we can add intelligent
vision to your apps. So, your custom vision and
computer vision will help you build vision models and we
also prebuild models for you. >> So, it's like AI, but you don't have to think about it you just call it
and it does the right thing. >> Exactly. >> Fantastic. So, what
are we talking about the things that are new
for this update, right? >> Yes. >> Awesome. So, why
don't you go to them? >> All right. So,
for computer vision, this is where we have all
of our prebuilt models. This is things like image
tagging, captioning, OCR. We also now are offering
a new version of OCR, which is significantly better, we'll go through
some examples of that. >> Cool. >> And we're also
expanding the languages available for captioning. >> Awesome and this
is really good because like you're building
a social media site for example and someone's
trying to sneak in offensive material in a picture, you can now capture it right away and know
this is happening. >> Exactly. And what this new engine does is it
doesn't really well with those pictures that it has a background in pictures
and words on top of it. >> Now I'm able to solve those capture problems
that everyone wants. I'm just kidding, of
course. So these are the new features OCR and
language for captioning? >> Yes. >> Anything else that
you want to add? >> No, I think that's it for- >> For this one. So
are we doing a demo, are you are going
to show us what's new in the custom for [inaudible] >> So, let's show a little bit
about the OCR [inaudible] first. >> Let's do it. >> So, this is an example
of text on image. This is what the old one did and this is what
our new one does. So as you can see
it catches a whole lot more and a lot
more accurately. >> I see. And so, in this case, it's just an API call
that you're calling, you're saying
the image and then it returns this text back. What is it that you're
actually getting back? >> So, you'll get the actual
bounding box of where it is. >> Okay. >> And the text in it. >> Okay. That's pretty amazing. >> Yes. >> Awesome. >> So, a few more
examples of that. This our old one didn't catch anything and now you can see
that the whole sign says. And so, this is one of those
where it's hard to see because you have lines over it, it's at an angle but we're
still catching the words here. >> That's really cool now. Did you just make
a newer OCR model or how why is it that
it's better now? >> So, we have two OCR models. We have the handwritten one and the one for text that works and I think
it's 23 languages. So, this is an extension
of the handwriting one. It only works in
English, but it's a much more powerful model. >> I see. Okay. Awesome. All
right, what else you got? >> So, you got
a few more examples. The old one it caught
all the words, but it didn't exactly
get them right. And now here I think
a more accurate model. >> That's cool. >> And one more. This one you
can see those things over the words and yet
we're still catching. >> And the bounding box too, which I think is important right? Because you can see
the bounding boxes are kind of okay
on the first one, but they're even better
on the second one. >> Yes, yes. >> And it caught "AND"
too, that's amazing. Cool. And so again, it's just a call
is it's something that you have to- Because
I remember when I called the API had
to tell it like what I wanted it to return. Is it just another fly
that you put in? >> Yes. So, this
will be recognized text and then it will say, handwritten, false and
you'll get the printed text. >> Fantastic. All right. Now we actually want
a custom vision. >> Yes. So, for custom vision, those of you who don't
know what it is. It's a tool to build up an image classification models quickly, easily with little data. >> I see. So, this is the case when computer vision
isn't quite enough. You want to learn to
distinguish between your own stuff. Did
I get that right? >> Exactly. And I think that you can't- that we aren't
currently looking for. So, if it's something specific to you or to something obscure, you can use custom
vision to get that. >> Fantastic.
And just an overview. How does one train the thing? >> So, you upload some images you had trained and then you
get back a model. >> Awesome. >> You don't need any prior machine learning
knowledge to do it. >> Awesome. So, what's new? >> So, we are now extending
it to object detection. Previously, you could just
tell what was in the image. Now we'll give you
a bounding box and tell you where things
are in the image. >> Holy cow, that's crazy. So, if I have a set of
products that I want to reco- let's say I want to
recognize products in an image. I can give it labels of the right products
and it will find it? Do I have to tell it where
the actual product is? >> Yes. Yes. >> Okay. >> So, we'll do an example
of how you do that and we've made it a lot easier to
tag a lot of these images. So, anyone who's tried to build object detectors before knows it's very painstaking to
draw these boxes around. So, we've made it
easier to do that. And then, you only
have to do it for maybe a couple hundred images or less and then you
have a great model. >> Awesome, let's take a look. >> All right. All right. So, this is
custom vision service. So, this is after you sign
and you'll see your projects. I've already built
an object detector for us. So, I've built, I'm sorry. This is the object detector. >> Okay. >> So, this one detects apples, oranges, and strawberries
as you can see. We have anywhere between
15 and 40 images. Forty-five images for each. >> Cool. >> So, what you'd get is when you upload them they
would be untagged. And then for an image like this, you'll hover and it will give you suggested bounding
boxes so I just click. That's a strawberry. Click.
Strawberry. Move to next one. >> That is so cool. >> Yes. And so, it
makes it really easy. If you don't like
the bounding box you can easily just adjust it, but you don't have to
drag and drop each one. Then again, you can always
still use drag and drop. If you want to make your own bounding boxes
if you don't catch it. But most of them we
get pretty close. >> And obviously, the
more of these that you give a bounding the box the
smarter it gets at this? >> Yes. Yes. >> So, how many do
you suggest like. I've always get this question
when I talk about customization because
it's an amazing thing to talk about. How many images should
I have in order to make it distinguishable depending
on the number of tags? And then, for the bounding
boxes do I need the same or more bounding boxes? >> So, the amount of images
we recommend is at least 15. But depending on how
different the objects are, depends on how many you need. >> Sure. >> So, if we're going
to do people detection. People look very different, from one person to the next. But if you're going
to do logo detection, no matter where used the logo, the logo itself,
doesn't really change. >> Right. >> So, you need a lot less there. >> Okay. >> So, it really
depends on what you're doing and the best advice
we can give is, try it and see how many you need. >> Awesome. But not for
the bounding box though? >> So, I'd say one per image. We say 15 images because really we're saying 15 bounding boxes,
but the more the better. >> Awesome. So, you
just try to label as many as you can and
put them up there? >> Yes. >> Awesome. All right. So, how does the API actually
return the bounding boxes, they look the same as
the computer vision one? >> Yes. So, it will give you. It won't be the exact same
output, but will be similar, it'll give you the coordinates of the bounding box and
what the tags are. And so, this is one that I
have already trained so, what you'll get is you'll
see this performance. So, I have precision of
89.5 and recall of 100. So, I can say I want it to be a more precise model and less
recall so what I would do, is pull up the probability
threshold and say, "I want to be precision of
100 and recall the 100. Only look at things
that have a probability of 73 or a lot higher." >> And what's
the MAP, that's new? >> That's the mean
average precision. So, this is the performance
across all of the- >> So, this is for
object detection? >>Yes. >> Okay. So
precision-recall are for tagging and the MAP score mean average precision is the precision of
the actual object detection? >> Yes. >> Got it. Okay. Perfect.
Perfect. Cause there has to be a new thing if we're
introducing a new thing. >> Exactly. >> Awesome. >> So, we can do
some quick tests on some images. >> Awesome. >> So, one of
the great things about object detection which you
don't get with classification, is that you can actually count how many objects there are. So, this one as we said
the threshold we wanted was something around 70 percent. Now, you can see that
all the boxes actually go around individual strawberries
and we missed one, but most of them we
see and we can get a rough count of how many are
in the option and [inaudible]. >> That's amazing. And now the other question
I have because before with the custom vision models, you were able to download some them and execute on the Edge. Will the object detection
also work on the Edge? >> Not yet, but we're
looking to get that soon. So, as you mentioned with export, we've actually added
some more export capabilities for classification. >> Okay. Cool. >> So if we go classification, this is the same free
classifier I just did in classification instead
of object detection. >> Okay. >> And so, we already had core ML and
Tensor flow for export. We've now supported ONNX, which is Windows ML and
Docker files. So, this is all the pieces
you need to create your own container and
so, you'll just do. If you have Docker on
your machine you just do docker build and you'll have a Linux
or Windows container. >> So, what is it that
you're actually down is like a zip file of all the stuff
that should here? >> Yes, yes. And a readme with very clear instructions of
how to turn that into a- >> That's cool. >> Container. And that would run on any docker enabled IoT Edge. >> Well, this is amazing. I love the things
that have been added. I use custom vision a lot, I show it to a lot of people,
they are super impressed. It's cool that now we
have an honest model. So, you can do windows ML. It's cool you have a docker file. So, you can do whatever
you want with this. And I love the object
detection bit on this. So, this is super amazing. Thanks so much for spending
some time with us. >> Thanks for having us out. >> Thanks so much for watching. We've been learning about
the new features inside of computer and custom vision for you to use in
your applications. Thank so much for watching. We'll see you next
time. Take care.

Leave a Reply

Your email address will not be published. Required fields are marked *