Here we visualize filters and outputs using the network architecture proposed by Krizhevsky et al. for ImageNet and implemented in caffe.

(This page follows DeCAF visualizations originally by Yangqing Jia.)

First, import required modules and set plotting parameters

%matplotlib inline

Run ./scripts/download_model_binary.py models/bvlc_reference_caffenet to get the pretrained CaffeNet model, load the net, specify test phase and CPU mode, and configure input preprocessing.

caffe.set_mode_cpu()

Run a classification pass

scores = net.predict([caffe.io.load_image(caffe_root + 'examples/images/tao187.jpg')])

The layer features and their shapes (10 is the batch size, corresponding to the the ten subcrops used by Krizhevsky et al.)

[(k, v.data.shape) for k, v in net.blobs.items()]

[('data', (10, 3, 227, 227)),
 ('conv1', (10, 96, 55, 55)),
 ('pool1', (10, 96, 27, 27)),
 ('norm1', (10, 96, 27, 27)),
 ('conv2', (10, 256, 27, 27)),
 ('pool2', (10, 256, 13, 13)),
 ('norm2', (10, 256, 13, 13)),
 ('conv3', (10, 384, 13, 13)),
 ('conv4', (10, 384, 13, 13)),
 ('conv5', (10, 256, 13, 13)),
 ('pool5', (10, 256, 6, 6)),
 ('fc6', (10, 4096)),
 ('fc7', (10, 4096)),
 ('fc8', (10, 1000)),
 ('prob', (10, 1000))]

The parameters and their shapes (each of these layers also has biases which are omitted here)

[(k, v[0].data.shape) for k, v in net.params.items()]

[('conv1', (96, 3, 11, 11)),
 ('conv2', (256, 48, 5, 5)),
 ('conv3', (384, 256, 3, 3)),
 ('conv4', (384, 192, 3, 3)),
 ('conv5', (256, 192, 3, 3)),
 ('fc6', (4096, 9216)),
 ('fc7', (4096, 4096)),
 ('fc8', (1000, 4096))]

Helper functions for visualization

# take an array of shape (n, height, width) or (n, height, width, channels)

The input image

# index four is the center crop

<matplotlib.image.AxesImage at 0x7f3a59c27f10>

The first layer filters, conv1

vis_square(filters.transpose(0, 2, 3, 1))

The first layer output, conv1 (rectified responses of the filters above, first 36 only)

feat = net.blobs['conv1'].data[4, :36]

The second layer filters, conv2

There are 256 filters, each of which has dimension 5 x 5 x 48. We show only the first 48 filters, with each channel shown separately, so that each filter is a row.

filters = net.params['conv2'][0].data

The second layer output, conv2 (rectified, only the first 36 of 256 channels)

feat = net.blobs['conv2'].data[4, :36]

The third layer output, conv3 (rectified, all 384 channels)

feat = net.blobs['conv3'].data[4]

The fourth layer output, conv4 (rectified, all 384 channels)

feat = net.blobs['conv4'].data[4]

The fifth layer output, conv5 (rectified, all 256 channels)

feat = net.blobs['conv5'].data[4]

The fifth layer after pooling, pool5

feat = net.blobs['pool5'].data[4]

The first fully connected layer, fc6 (rectified)

We show the output values and the histogram of the positive values

feat = net.blobs['fc6'].data[4]

The second fully connected layer, fc7 (rectified)

feat = net.blobs['fc7'].data[4]

The final probability output, prob

feat = net.blobs['prob'].data[4]

[<matplotlib.lines.Line2D at 0x7f3a596b3bd0>]

Let's see the top 5 predicted labels.

# load labels

['n02687172 aircraft carrier, carrier, flattop, attack aircraft carrier'
 'n03673027 liner, ocean liner'
 'n03095699 container ship, containership, container vessel'
 'n03344393 fireboat' 'n03662601 lifeboat']

​