背景与挑战
在现代深度学习算法中,对未标记数据的手工标注是其主要局限性之一。为了训练一个好的模型,咱们一般须要准备大量的标记数据。在少数类和数据的状况下,咱们可使用带有标签的公共数据集的预训练模型,并使用本身的数据微调最后几层便可。
可是,当你的数据很大时(好比商店中的产品或人脸,..),就会很容易遇到问题,而且仅经过几个可训练的层很难学习模型,此外,未标记数据(例如,文档文本,Internet上的图像)的数量是无限的,为任务标记全部标签几乎是不可能的,可是不使用它们又绝对是一种浪费。
在这种状况下,就须要使用新的数据集从头开始训练深度模型,这时就须要花费大量的时间和精力来标记数据,这也就是自监督学习诞生的缘由。其背后的想法很简单,主要有两个任务:
代理任务:深度模型将从没有注释的未标记数据中学习可概括的表征信息,而后利用隐式信息自行生成监督信号。git
3.损失函数
学习目标是一个基于表示对的二元分类问题,所以,咱们可使用二进制交叉熵损失来最大化伯努利对数似然,其中关系分数y表示经过sigmoid激活函数诱导的表示成员的几率估计。
最后,本文[6]还提供了在标准数据集(CIFAR-十、CIFAR-100、CIFAR-100-20、STL-十、tiny-ImageNet、SlimageNet)、不一样主干(浅层和深层)、相同的学习进度(epochs)上的关系推理结果,结果以下,欲了解更多信息,请查阅他的论文。
实验评估
在本文中,我想在公共图像数据集STL-10上重现关系推理系统,该数据集由10个类(飞机、鸟、汽车、猫、鹿、狗、马、猴、船、卡车)组成,颜色为96x96像素。
首先,咱们须要导入一些重要的库算法
import torch import torchvision import torchvision.transforms as transforms from PIL import Image import math import time from torch.utils.data import DataLoader from time import sleep from tqdm import tqdm import numpy as np from fastprogress.fastprogress import master_bar, progress_bar from torchvision import models import matplotlib.pyplot as plt from torchvision.utils import make_grid %config InlineBackend.figure_format = 'svg'
STL-10数据集包含1300个标记图像(500个用于训练,800个用于测试),同时它也包括100000个未标记的图像,这些图像来自类似但更普遍的分布,例如,除了标签集中的动物外,它还包含其余类型的动物(熊、兔子等)和车辆(火车、公共汽车等)
而后根据做者的建议建立关系推理类网络
class RelationalReasoning(torch.nn.Module): """自监督关系推理。 方法的基本实现,它使用 “cat”聚合函数(最有效), 可与任何主干一块儿使用。 """ def __init__(self, backbone, feature_size=64): super(RelationalReasoning, self).__init__() self.backbone = backbone.to(device) self.relation_head = torch.nn.Sequential( torch.nn.Linear(feature_size*2, 256), torch.nn.BatchNorm1d(256), torch.nn.LeakyReLU(), torch.nn.Linear(256, 1)).to(device) def aggregate(self, features, K): relation_pairs_list = list() targets_list = list() size = int(features.shape[0] / K) shifts_counter=1 for index_1 in range(0, size*K, size): for index_2 in range(index_1+size, size*K, size): # 默认状况下使用“cat”聚合函数 pos_pair = torch.cat([features[index_1:index_1+size], features[index_2:index_2+size]], 1) # 经过滚动小批无碰撞的洗牌(负) neg_pair = torch.cat([ features[index_1:index_1+size], torch.roll(features[index_2:index_2+size], shifts=shifts_counter, dims=0)], 1) relation_pairs_list.append(pos_pair) relation_pairs_list.append(neg_pair) targets_list.append(torch.ones(size, dtype=torch.float32)) targets_list.append(torch.zeros(size, dtype=torch.float32)) shifts_counter+=1 if(shifts_counter>=size): shifts_counter=1 # avoid identity pairs relation_pairs = torch.cat(relation_pairs_list, 0) targets = torch.cat(targets_list, 0) return relation_pairs.to(device), targets.to(device) def train(self, tot_epochs, train_loader): optimizer = torch.optim.Adam([ {'params': self.backbone.parameters()}, {'params': self.relation_head.parameters()}]) BCE = torch.nn.BCEWithLogitsLoss() self.backbone.train() self.relation_head.train() mb = master_bar(range(1, tot_epochs+1)) for epoch in mb: # 实际目标被丢弃(无监督) train_loss = 0 accuracy_list = list() for data_augmented, _ in progress_bar(train_loader, parent=mb): K = len(data_augmented) # tot augmentations x = torch.cat(data_augmented, 0).to(device) optimizer.zero_grad() # 前向传播(主干) features = self.backbone(x) # 聚合函数 relation_pairs, targets = self.aggregate(features, K) # 前向传播 (关系头) score = self.relation_head(relation_pairs).squeeze() # 交叉熵损失与向后传播 loss = BCE(score, targets) loss.backward() optimizer.step() train_loss += loss.item()*K predicted = torch.round(torch.sigmoid(score)) correct = predicted.eq(targets.view_as(predicted)).sum() accuracy = (correct / float(len(targets))).cpu().numpy() accuracy_list.append(accuracy) epoch_loss = train_loss / len(train_loader.sampler) epoch_accuracy = sum(accuracy_list)/len(accuracy_list)*100 mb.write(f"Epoch [{epoch}/{tot_epochs}] - Accuracy: {epoch_accuracy:.2f}% - Loss: {epoch_loss:.4f}")
为了比较关系推理方法在浅层模型和深层模型上的性能,咱们将建立一个浅层模型(Conv4),并使用深层模型的结构(Resnet34)。app
backbone = Conv4() # 浅层模型 backbone = models.resnet34(pretrained = False) # 深层模型
根据做者的建议,设置了一些超参数和加强策略。咱们将在未标记的STL-10数据集上用关系头训练主干。dom
# 模拟的超参数 K = 16 # tot augmentations, 论文中 K=32 batch_size = 64 # 论文中使用64 tot_epochs = 10 # 论文中使用200 feature_size = 64 # Conv4 主干的单元数 feature_size = 1000 # Resnet34 主干的单元数backbone # 扩充策略 normalize = transforms.Normalize(mean=[0.4406, 0.4273, 0.3858], std=[0.2687, 0.2613, 0.2685]) color_jitter = transforms.ColorJitter(brightness=0.8, contrast=0.8, saturation=0.8, hue=0.2) rnd_color_jitter = transforms.RandomApply([color_jitter], p=0.8) rnd_gray = transforms.RandomGrayscale(p=0.2) rnd_rcrop = transforms.RandomResizedCrop(size=96, scale=(0.08, 1.0), interpolation=2) rnd_hflip = transforms.RandomHorizontalFlip(p=0.5) train_transform = transforms.Compose([rnd_rcrop, rnd_hflip, rnd_color_jitter, rnd_gray, transforms.ToTensor(), normalize]) # 加载到数据加载器 torch.manual_seed(1) torch.cuda.manual_seed(1) train_set = MultiSTL10(K=K, root='data', split='unlabeled', transform=train_transform, download=True) train_loader = DataLoader(train_set,batch_size=batch_size, shuffle=True,num_workers=2, pin_memory=True)
到目前为止,咱们已经创造了训练咱们模型所需的一切,如今咱们将在10个时期和16个加强图像(K)中训练主干和关系头模型,使用1个GPU Tesla P100-PCIE-16GB在浅层模型(Conv4)上花费了4个小时,在深层模型(Resnet34)上花费了6个小时(你能够自由地更改时期数以及另外一个超参数以得到更好的结果)机器学习
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu") backbone.to(device) model = RelationalReasoning(backbone, feature_size) model.train(tot_epochs=tot_epochs, train_loader=train_loader) torch.save(model.backbone.state_dict(), 'model.tar')
在训练了咱们的主干模型以后,咱们丢弃了关系头,只将主干用于下游任务。咱们须要使用STL-10(500个图像)中的标记数据来微调咱们的主干,并在测试集中测试最终的模型(800个图像)。训练和测试数据集将加载到Dataloader中,而无需进行扩充。ide
# set random seed torch.manual_seed(1) torch.cuda.manual_seed(1) # no augmentations used for linear evaluation transform_lineval = transforms.Compose([transforms.ToTensor(), normalize]) # Download STL10 labeled train and test dataset train_set_lineval = torchvision.datasets.STL10('data', split='train', transform=transform_lineval) test_set_lineval = torchvision.datasets.STL10('data', split='test', transform=transform_lineval) # Load dataset in data loader train_loader_lineval = DataLoader(train_set_lineval, batch_size=128, shuffle=True) test_loader_lineval = DataLoader(test_set_lineval, batch_size=128, shuffle=False)
咱们将加载预训练的主干模型,并使用一个简单的线性模型将输出特性与数据集中的许多类链接起来。svg
# linear model linear_layer = torch.nn.Linear(64, 10) # if backbone is Conv4 linear_layer = torch.nn.Linear(1000, 10) # if backbone is Resnet34 # defining a raw backbone model backbone_lineval = Conv4() # Conv4 backbone_lineval = models.resnet34(pretrained = False) # Resnet34 # load model checkpoint = torch.load('model.tar') # name of pretrain weight backbone_lineval.load_state_dict(checkpoint)
此时,只训练线性模型,冻结主干模型。首先,咱们将看到微调Conv4的结果函数
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu") optimizer = torch.optim.Adam(linear_layer.parameters()) CE = torch.nn.CrossEntropyLoss() linear_layer.to(device) linear_layer.train() backbone_lineval.to(device) backbone_lineval.eval() print('Linear evaluation') for epoch in range(20): accuracy_list = list() for i, (data, target) in enumerate(train_loader_lineval): optimizer.zero_grad() data = data.to(device) target= target.to(device) output = backbone_lineval(data).to(device).detach() output = linear_layer(output) loss = CE(output, target) loss.backward() optimizer.step() # estimate the accuracy prediction = output.argmax(-1) correct = prediction.eq(target.view_as(prediction)).sum() accuracy = (100.0 * correct / len(target)) accuracy_list.append(accuracy.item()) print('Epoch [{}] loss: {:.5f}; accuracy: {:.2f}%' \ .format(epoch+1, loss.item(), sum(accuracy_list)/len(accuracy_list))) Linear evaluation Epoch [1] loss: 2.24857; accuracy: 14.77% Epoch [2] loss: 2.23015; accuracy: 24.49% Epoch [3] loss: 2.18529; accuracy: 32.46% Epoch [4] loss: 2.24595; accuracy: 36.45% Epoch [5] loss: 2.09482; accuracy: 42.46% Epoch [6] loss: 2.11192; accuracy: 43.40% Epoch [7] loss: 2.05064; accuracy: 47.29% Epoch [8] loss: 2.03494; accuracy: 47.38% Epoch [9] loss: 1.91709; accuracy: 47.46% Epoch [10] loss: 1.99181; accuracy: 48.03% Epoch [11] loss: 1.91527; accuracy: 48.28% Epoch [12] loss: 1.93190; accuracy: 49.55% Epoch [13] loss: 2.00492; accuracy: 49.71% Epoch [14] loss: 1.85328; accuracy: 49.94% Epoch [15] loss: 1.88910; accuracy: 49.86% Epoch [16] loss: 1.88084; accuracy: 50.76% Epoch [17] loss: 1.63443; accuracy: 50.74% Epoch [18] loss: 1.76303; accuracy: 50.62% Epoch [19] loss: 1.70486; accuracy: 51.46% Epoch [20] loss: 1.61629; accuracy: 51.84% 而后检查测试集 accuracy_list = list() for i, (data, target) in enumerate(test_loader_lineval): data = data.to(device) target= target.to(device) output = backbone_lineval(data).detach() output = linear_layer(output) # estimate the accuracy prediction = output.argmax(-1) correct = prediction.eq(target.view_as(prediction)).sum() accuracy = (100.0 * correct / len(target)) accuracy_list.append(accuracy.item()) print('Test accuracy: {:.2f}%'.format(sum(accuracy_list)/len(accuracy_list))) Test accuracy: 49.98% Conv4在测试集上得到了49.98%的准确率,这意味着主干模型能够在未标记的数据集中学习有用的特征,只需在不多的时间段内进行微调就能够达到很好的效果。如今让咱们检查深度模型的性能。 device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu") optimizer = torch.optim.Adam(linear_layer.parameters()) CE = torch.nn.CrossEntropyLoss() linear_layer.to(device) linear_layer.train() backbone_lineval.to(device) backbone_lineval.eval() print('Linear evaluation') for epoch in range(20): accuracy_list = list() for i, (data, target) in enumerate(train_loader_lineval): optimizer.zero_grad() data = data.to(device) target= target.to(device) output = backbone_lineval(data).to(device).detach() output = linear_layer(output) loss = CE(output, target) loss.backward() optimizer.step() # estimate the accuracy prediction = output.argmax(-1) correct = prediction.eq(target.view_as(prediction)).sum() accuracy = (100.0 * correct / len(target)) accuracy_list.append(accuracy.item()) print('Epoch [{}] loss: {:.5f}; accuracy: {:.2f}%' \ .format(epoch+1, loss.item(), sum(accuracy_list)/len(accuracy_list)))
Linear evaluation Epoch [1] loss: 2.68060; accuracy: 47.79% Epoch [2] loss: 1.56714; accuracy: 58.34% Epoch [3] loss: 1.18530; accuracy: 56.50% Epoch [4] loss: 0.94784; accuracy: 57.91% Epoch [5] loss: 1.48861; accuracy: 57.56% Epoch [6] loss: 0.91673; accuracy: 57.87% Epoch [7] loss: 0.90533; accuracy: 58.96% Epoch [8] loss: 2.10333; accuracy: 57.40% Epoch [9] loss: 1.58732; accuracy: 55.57% Epoch [10] loss: 0.88780; accuracy: 57.79% Epoch [11] loss: 0.93859; accuracy: 58.44% Epoch [12] loss: 1.15898; accuracy: 57.32% Epoch [13] loss: 1.25100; accuracy: 57.79% Epoch [14] loss: 0.85337; accuracy: 59.06% Epoch [15] loss: 1.62060; accuracy: 58.91% Epoch [16] loss: 1.30841; accuracy: 58.95% Epoch [17] loss: 0.27441; accuracy: 58.11% Epoch [18] loss: 1.58133; accuracy: 58.73% Epoch [19] loss: 0.76258; accuracy: 58.81% Epoch [20] loss: 0.62280; accuracy: 58.50%
而后评估测试数据集性能
accuracy_list = list() for i, (data, target) in enumerate(test_loader_lineval): data = data.to(device) target= target.to(device) output = backbone_lineval(data).detach() output = linear_layer(output) # estimate the accuracy prediction = output.argmax(-1) correct = prediction.eq(target.view_as(prediction)).sum() accuracy = (100.0 * correct / len(target)) accuracy_list.append(accuracy.item()) print('Test accuracy: {:.2f}%'.format(sum(accuracy_list)/len(accuracy_list)))
Test accuracy: 55.38%
结果显示,咱们能够在测试集上得到55.38%的精度。本文的主要目的是重现和评估关系推理方法论,以指导模型识别无标签对象,所以,这些结果是很是有用的。若是你以为不满意,你能够经过改变超参数来自由地作进行实验,好比增长数量,时期,或者改变模型结构。
最后的想法
自监督关系推理在定量和定性两方面都是有效的,而且具备从浅到深的不一样大小的主干。经过比较学习到的表示能够很容易地从一个领域转移到另外一个领域,它们具备细粒度和紧凑性,这多是因为精度和扩充次数之间的相关性。在关系推理中,根据做者的实验,扩充的数量对对象簇的质量有着主要的影响[4]。自监督学习在许多方面都有很强的潜力成为机器学习的将来。
参考文献
[1] Carl Doersch et. al, Unsupervised Visual Representation Learning by Context Prediction, 2015.
[2] Mehdi Noroozi et. al, Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles, 2017.
[3] Zhang et. al, Colorful Image Colorization, 2016.
[4] Mehdi Noroozi et. al, Representation Learning by Learning to Count, 2017.
[5] Ting Chen et. al, A Simple Framework for Contrastive Learning of Visual Representations, 2020.
[6] Massimiliano Patacchiola et. al, Self-Supervised Relational Reasoning for Representation Learning, 2020.
[7] Adam Santoro et. al, Relational recurrent neural networks, 2018.