用TF.Learn分類手寫數字--機器學習食譜#7。 (Classifying Handwritten Digits with TF.Learn - Machine Learning Recipes #7)

字幕列表影片播放

JOSH GORDON: Last episode we trained
in Image Classifier using TensorFlow for Poets,
and this time, we'll write one using TF.Learn.
The problem we'll start on today is
classifying handwritten digits from the MNIST dataset,
and writing a simple classifier for these is often
considered the Hello World of computer vision.
Now MNIST is a multi-class classification problem.
Given an image of a digit, our job
will be to predict which one it is.
I wrote an IPython notebook for this episode,
and you can find a link to it in the description.
And to make it easier for you to configure your environment,
I'll start with a quick screencast of installing
TensorFlow using Docker.
First, here's an outline of what we'll cover.
I'll show you how to download the dataset
and visualize images.
Next, we'll train a classifier, evaluate it,
and use it to make predictions on new images.
Then we'll visualize the weights the classifier learns
to gain intuition for how it works under the hood.
Let's start by installing TensorFlow.
You can find installation instructions
for Docker linked from the Getting Started page
on TensorFlow.org, and I'll start this screencast
assuming you've just finished downloading and installing
Docker itself but haven't started installing TensorFlow.
Starting from a fresh install of Docker, the first thing to do
is open the Docker Quickstart terminal.
And when this appears, you'll see an IP address just
below the whale.
Copy it down.
We'll need it later.
Next, we'll launch a Docker container
with a TensorFlow image.
The image is hosted on Docker hub,
and there's a link to that in the description.
The image contains TensorFlow with all its dependencies
properly configured, and here's the command
we'll use to download and launch the image.
But first, let's choose the version we want.
The versions are on this page, and we'll
use the latest release.
Now we can copy-paste the command into a terminal
and add a colon with the version number.
If this is the first time you've run the image,
it'll be downloaded automatically.
And on subsequent runs, it'll be cached locally.
The image starts automatically, and by default, it
runs a notebook server.
All that's left for us to do is to open up a browser
and point it to the IP we jotted down earlier on port 8888.
And now we have an IPython notebook
that we can experiment with in our browser served
by the container.
You can find the notebook for this episode in the description
and upload it through the UI.
OK.
Now onto code.
Here are the imports we'll use.
I'll use matplotlib to display images, and, of course,
we'll use TF.Learn to train the classifier.
All of these are installed with the image.
Next, we'll download the MNIST dataset,
and we have a nice one liner for that.
The dataset contains thousands of labeled images
of handwritten digits.
It's pre-divided into train, which is 55,000,
and test, which is 10,000.
Let's visualize a few of these to get a feel.
This code displays an image along with its label,
and you might notice I'm reshaping the image,
and I'll explain why in a bit.
The first image from the testing set is a seven,
and you can see the example index as well as the label.
Here's the second image.
Now both of these are clearly drawn,
but there's a variety of different handwriting
samples in this dataset.
Here's an image that's harder to recognize.
These images are low resolution, just 28
by 28 pixels in grayscale.
Also note they're properly segmented.
That means each image contains exactly one digit.
Now let's talk about the features we'll use.
When we're working with images, we
use the raw pixels as features.
That's because extracting useful features
from images, like textures and shapes, is hard.
Now a 28 by 28 image has 784 pixels,
so we have 784 features.
And here, we're using the flattened representation
of the image.
To flatten an image means to convert it from a 2D array
to a 1D array by unstacking the rows and lining them up.
That's why we had to reshape this array
to display it earlier.
Now we can initialize the classifier,
and here, we'll use a linear classifier.
We'll provide two parameters.
The first indicates how many classes we have,
and there are 10, one for each type of digit.
The second informs the classifier
about the features we'll use.
Now I'll draw a quick diagram of a linear classifier
to give you a high level preview of how it works under the hood.
You could think of the classifier
as adding up the evidence that the image is
each type of digit.
The input nodes are on the top, represented by Xes,
and the output nodes are on the bottom represented by Ys.
We have one input node for each feature or pixel in the image
and one output node for each digit
the image could represent.
Here, we have 784 inputs and 10 outputs.
I've just drawn a few of them, so everything
fits on the screen.
Now the inputs and outputs are fully connected,
and each of these edges has a weight.
When we classify an image, you can think of each pixel
as going on a journey.
First, it flows into its input node,
and next, it travels along the edges.
Along the way, it's multiplied by the weight on the edge,
and the output nodes gather evidence
that the image we're classifying represents each type of digit.
The more evidence we gather, say on the eight output,
the more likely it is the image is an eight.
And to calculate how much evidence we have,
we sum the value of the pixel intensities multiplied
by the weights.
Then we can predict that the image belongs to the output
node with the most evidence.
The important part is the weights,
and by setting them properly, we can
get accurate classifications.
We begin with random weights, then gradually adjust them
towards better values.
And this happens inside the fit method.
Once we have a trained model, we can evaluate it.
Using the evaluate method, we see
that it correctly classifies about 90% of the test set.
We can also make predictions on individual images.
Here's one that it correctly classifies, and here's
one that it gets wrong.
Now I want to show you how to visualize the weights
the classifier learns.
Here, positive weights are drawn in red,
and negative weights are drawn in blue.
So what do these weights tell us?
Well, to understand that, I'll show four images of ones.
They're all drawn slightly differently,
but take a look at the middle pixel.
Notice that it's filled in on every image.
When that pixel is filled in, it's
evidence that the image we're looking at is a one,
so we'd expect a highway on that edge.
Now let's take a look at four zeros.
Notice that the middle pixel is empty.
Although there's lots of ways to draw zeros,
if that middle pixel is filled in,
it's evidence against the image being a zero,
so we'd expect a negative weight on the edge.
And looking at the images of the weights,
we can almost see outlines of the digits drawn
in red for each class.
We were able to visualize these, because we started
with 784 pixels, and we learned 10 weights for each, one
for each type of digit.
We then reshape the weights into a 2D array.
OK.
That's it for now.
Of course, there's lots more to learn about this,
and I put my favorite links in the description.
Coming up next time, we'll experiment with deep learning,
and I'll cover in more detail what we introduced here today.
Thanks very much for watching, and I'll see you then.
[MUSIC PLAYING]