使用Tensorflow和VGG16预训模型进行预测
fast.ai的入门教程中使用了kaggle: dogs vs cats做为例子来让你们入门Computer Vision。不过并未应用到最近很火的Tensorflow。Keras虽然能够调用Tensorflow做为backend,不过既然能够少走一层直接走Tensorflow,那秉着学习的想法,就直接用Tensorflow来一下把。html
据说工程上广泛的作法并非从头开始训练模型,而是直接用已经训练完的模型稍加改动(这个过程叫finetune)来达到目的。那么这里就须要用Tensorflow还原出VGG16的模型。这里借鉴了frossard的python代码和他转化的权重。架构具体以下:(cs231n有更详细的介绍)python
INPUT: [224x224x3] memory: 224*224*3=150K weights: 0
CONV3-64: [224x224x64] memory: 224*224*64=3.2M weights: (3*3*3)*64 = 1,728
CONV3-64: [224x224x64] memory: 224*224*64=3.2M weights: (3*3*64)*64 = 36,864
POOL2: [112x112x64] memory: 112*112*64=800K weights: 0
CONV3-128: [112x112x128] memory: 112*112*128=1.6M weights: (3*3*64)*128 = 73,728
CONV3-128: [112x112x128] memory: 112*112*128=1.6M weights: (3*3*128)*128 = 147,456
POOL2: [56x56x128] memory: 56*56*128=400K weights: 0
CONV3-256: [56x56x256] memory: 56*56*256=800K weights: (3*3*128)*256 = 294,912
CONV3-256: [56x56x256] memory: 56*56*256=800K weights: (3*3*256)*256 = 589,824
CONV3-256: [56x56x256] memory: 56*56*256=800K weights: (3*3*256)*256 = 589,824
POOL2: [28x28x256] memory: 28*28*256=200K weights: 0
CONV3-512: [28x28x512] memory: 28*28*512=400K weights: (3*3*256)*512 = 1,179,648
CONV3-512: [28x28x512] memory: 28*28*512=400K weights: (3*3*512)*512 = 2,359,296
CONV3-512: [28x28x512] memory: 28*28*512=400K weights: (3*3*512)*512 = 2,359,296
POOL2: [14x14x512] memory: 14*14*512=100K weights: 0
CONV3-512: [14x14x512] memory: 14*14*512=100K weights: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K weights: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512] memory: 14*14*512=100K weights: (3*3*512)*512 = 2,359,296
POOL2: [7x7x512] memory: 7*7*512=25K weights: 0
FC: [1x1x4096] memory: 4096 weights: 7*7*512*4096 = 102,760,448
FC: [1x1x4096] memory: 4096 weights: 4096*4096 = 16,777,216
FC: [1x1x1000] memory: 1000 weights: 4096*1000 = 4,096,000
具体实现移步VGG16。这里要注意的一点就是最后的输出是不须要通过Relu的。git
预测猫和狗不能照搬这个架构,由于VGG16是用来预测ImageNet上1000个不一样种类的。用来预测猫和狗两种类别,须要在这个架构的基础上再加一层FC把1000转化成2个。(也能够把最后一层替换掉,直接输出成2个)。我还在VGG16以后多加了一层BN,原来VGG16的时候并不存在BN。我也并无在每一个CONV后面加,由于不想算...github
FC的输出在训练的时候使用Cross Entropy损失函数,预测的时候使用Softmax。这样就能够识别出给定图片是猫仍是狗了。具体代码移步cats_model.pyredux
咱们来看一下效果如何。完整的:Jupyter Notebook架构
未通过Finetune直接运行VGG16改模型(加上了最后一层FC)的结果(预测很是不许,由于最后一层的权重都是随机的)。这么作的目的是看一下模型是否能运行,顺便看看能蒙对几个。less
通过一次迭代,准确率就达到95%了(重复过几回,此次并非最高的)。函数
再看一下一样的图片预测结果,彷佛准确了不少。post
Final Thoughts学习
图像识别很是有趣,是一个很是有挑战的领域。