参考了Mask-RCNN实例分割模型的训练教程:html
- pytorch官方的Mask-RCNN实例分割模型训练教程:TORCHVISION OBJECT DETECTION FINETUNING TUTORIAL
- 官方Mask-RCNN训练教程的中文翻译:手把手教你训练本身的Mask R-CNN图像实例分割模型(PyTorch官方教程)
在Mask-RCNN实例分割模型训练的基础上稍做修改便可实现Faster-RCNN目标检测模型的训练node
相关网页:python
- torchvision自带的图像分类、语义分割、目标检测、实例分割、关键点检测、视频分类模型:TORCHVISION.MODELS
- torchvision Github项目地址: https://github.com/pytorch/vision
1. 准备工做
除了须要安装pytorch和torchvision外,还须要安装COCO的API pycocotools
windows系统安装pycocotools的方法:Windows下安装pycocotools
git
导入相关包和模块:github
import torch import os import numpy as np import cv2 import matplotlib.pyplot as plt from torchvision import datasets, transforms from PIL import Image from xml.dom.minidom import parse %matplotlib inline
2. 定义数据集
我使用的是本身使用labelme进行标注后转为voc格式的目标检测数据集,除了背景外有两种标签(mark_type_1和mark_type_2),即num_classes=3
个人voc数据集的目录结构以下图所示:
windows
Annotations文件夹下的xml标注举例:网络
<annotation> <folder/> <filename>172.26.80.5_01_20191128084208520_TIMING.jpg</filename> <database/> <annotation/> <image/> <size> <height>1536</height> <width>2048</width> <depth>3</depth> </size> <segmented/> <object> <name>mark_type_1</name> <pose/> <truncated/> <difficult/> <bndbox> <xmin>341.4634146341463</xmin> <ymin>868.2926829268292</ymin> <xmax>813.4146341463414</xmax> <ymax>986.5853658536585</ymax> </bndbox> </object> <object> <name>mark_type_1</name> <pose/> <truncated/> <difficult/> <bndbox> <xmin>1301.2195121951218</xmin> <ymin>815.8536585365853</ymin> <xmax>1769.512195121951</xmax> <ymax>936.5853658536585</ymax> </bndbox> </object> </annotation>
该标注包含了一个类别(mark_type_1)的两个bboxapp
定义数据集:dom
class MarkDataset(torch.utils.data.Dataset): def __init__(self, root, transforms=None): self.root = root self.transforms = transforms # load all image files, sorting them to ensure that they are aligned self.imgs = list(sorted(os.listdir(os.path.join(root, "JPEGImages")))) self.bbox_xml = list(sorted(os.listdir(os.path.join(root, "Annotations")))) def __getitem__(self, idx): # load images and bbox img_path = os.path.join(self.root, "JPEGImages", self.imgs[idx]) bbox_xml_path = os.path.join(self.root, "Annotations", self.bbox_xml[idx]) img = Image.open(img_path).convert("RGB") # 读取文件,VOC格式的数据集的标注是xml格式的文件 dom = parse(bbox_xml_path) # 获取文档元素对象 data = dom.documentElement # 获取 objects objects = data.getElementsByTagName('object') # get bounding box coordinates boxes = [] labels = [] for object_ in objects: # 获取标签中内容 name = object_.getElementsByTagName('name')[0].childNodes[0].nodeValue # 就是label,mark_type_1或mark_type_2 labels.append(np.int(name[-1])) # 背景的label是0,mark_type_1和mark_type_2的label分别是1和2 bndbox = object_.getElementsByTagName('bndbox')[0] xmin = np.float(bndbox.getElementsByTagName('xmin')[0].childNodes[0].nodeValue) ymin = np.float(bndbox.getElementsByTagName('ymin')[0].childNodes[0].nodeValue) xmax = np.float(bndbox.getElementsByTagName('xmax')[0].childNodes[0].nodeValue) ymax = np.float(bndbox.getElementsByTagName('ymax')[0].childNodes[0].nodeValue) boxes.append([xmin, ymin, xmax, ymax]) boxes = torch.as_tensor(boxes, dtype=torch.float32) # there is only one class labels = torch.as_tensor(labels, dtype=torch.int64) image_id = torch.tensor([idx]) area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]) # suppose all instances are not crowd iscrowd = torch.zeros((len(objects),), dtype=torch.int64) target = {} target["boxes"] = boxes target["labels"] = labels # 因为训练的是目标检测网络,所以没有教程中的target["masks"] = masks target["image_id"] = image_id target["area"] = area target["iscrowd"] = iscrowd if self.transforms is not None: # 注意这里target(包括bbox)也转换\加强了,和from torchvision import的transforms的不一样 # https://github.com/pytorch/vision/tree/master/references/detection 的 transforms.py里就有RandomHorizontalFlip时target变换的示例 img, target = self.transforms(img, target) return img, target def __len__(self): return len(self.imgs)
3. 定义模型
有两种方式来修改torchvision默认的目标检测模型:第一种,采用预训练的模型,在修改网络最后一层后finetuning微调;第二种,根据须要替换掉模型中的骨干网络,如将ResNet替换成MobileNet等。
这两种方法的具体使用能够参考最文章开头的官方教程以及官方教程翻译,在这里我选择了第一种方法。
定义模型能够简单的使用:
函数
torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=3, pretrained_backbone=True)
也可参考教程的写法:
import torchvision from torchvision.models.detection.faster_rcnn import FastRCNNPredictor def get_object_detection_model(num_classes): # load an object detection model pre-trained on COCO model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True) # replace the classifier with a new one, that has num_classes which is user-defined num_classes = 3 # 3 class (mark_type_1,mark_type_2) + background # get the number of input features for the classifier in_features = model.roi_heads.box_predictor.cls_score.in_features # replace the pre-trained head with a new one model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) return model
4. 数据加强
在图像输入到网络前,须要对其进行数据加强。这里须要注意的是,因为Faster R-CNN模型自己能够处理归一化(默认使用ImageNet的均值和标准差来归一化)及尺度变化的问题,于是无需在这里进行mean/std normalization或图像缩放的操做。
因为from torchvision import的transforms只能对图片进行数据加强,而没法同时改变图片对应的label标签,所以咱们选择使用torchvision Github项目中的一些封装好的用于模型训练和测试的函数:https://github.com/pytorch/vision/tree/master/references/detection
其中的engine.py、utils.py、transforms.py、coco_utils.py、coco_eval.py咱们会用到,把这些文件下载下来。我把这些文件放在了正在写的Faster-RCNN目标检测模型训练.ipynb脚本的旁边
查看下载下来的transforms.py,能够看到它里面就有对图像进行水平翻转(RandomHorizontalFlip)时target(bbox)变换的示例:
class RandomHorizontalFlip(object): def __init__(self, prob): self.prob = prob def __call__(self, image, target): if random.random() < self.prob: height, width = image.shape[-2:] image = image.flip(-1) bbox = target["boxes"] bbox[:, [0, 2]] = width - bbox[:, [2, 0]] target["boxes"] = bbox if "masks" in target: target["masks"] = target["masks"].flip(-1) if "keypoints" in target: keypoints = target["keypoints"] keypoints = _flip_coco_person_keypoints(keypoints, width) target["keypoints"] = keypoints return image, target
由此编写相应的数据加强函数:
import utils import transforms as T from engine import train_one_epoch, evaluate # utils、transforms、engine就是刚才下载下来的utils.py、transforms.py、engine.py def get_transform(train): transforms = [] # converts the image, a PIL image, into a PyTorch Tensor transforms.append(T.ToTensor()) if train: # during training, randomly flip the training images # and ground-truth for data augmentation # 50%的几率水平翻转 transforms.append(T.RandomHorizontalFlip(0.5)) return T.Compose(transforms)
5. 训练模型
至此,数据集、模型、数据加强的部分都已经写好。在模型初始化、优化器及学习率调整策略选定后,就能够开始训练了。在每一个epoch训练完成后,咱们还要在测试集上对模型的性能进行评价。
from engine import train_one_epoch, evaluate import utils root = r'数据集路径' # train on the GPU or on the CPU, if a GPU is not available device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') # 3 classes, mark_type_1,mark_type_2,background num_classes = 3 # use our dataset and defined transformations dataset = MarkDataset(root, get_transform(train=True)) dataset_test = MarkDataset(root, get_transform(train=False)) # split the dataset in train and test set # 个人数据集一共有492张图,差很少训练验证4:1 indices = torch.randperm(len(dataset)).tolist() dataset = torch.utils.data.Subset(dataset, indices[:-100]) dataset_test = torch.utils.data.Subset(dataset_test, indices[-100:]) # define training and validation data loaders # 在jupyter notebook里训练模型时num_workers参数只能为0,否则会报错,这里就把它注释掉了 data_loader = torch.utils.data.DataLoader( dataset, batch_size=2, shuffle=True, # num_workers=4, collate_fn=utils.collate_fn) data_loader_test = torch.utils.data.DataLoader( dataset_test, batch_size=2, shuffle=False, # num_workers=4, collate_fn=utils.collate_fn) # get the model using our helper function model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=num_classes, pretrained_backbone=True) # 或get_object_detection_model(num_classes) # move model to the right device model.to(device) # construct an optimizer params = [p for p in model.parameters() if p.requires_grad] # SGD optimizer = torch.optim.SGD(params, lr=0.0003, momentum=0.9, weight_decay=0.0005) # and a learning rate scheduler # cos学习率 lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=1, T_mult=2) # let's train it for epochs num_epochs = 31 for epoch in range(num_epochs): # train for one epoch, printing every 10 iterations # engine.py的train_one_epoch函数将images和targets都.to(device)了 train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=50) # update the learning rate lr_scheduler.step() # evaluate on the test dataset evaluate(model, data_loader_test, device=device) print('') print('==================================================') print('') print("That's it!")
能够看到第一个epoch的学习率并非设定的0.0003,而是从0开始逐渐增加的,其缘由是在engine.py的train_one_epoch函数中,第一个epoch采用了warmup的学习率:
if epoch == 0: warmup_factor = 1. / 1000 warmup_iters = min(1000, len(data_loader) - 1) lr_scheduler = utils.warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor)
此外,因为个人数据集中的bbox的面积都比较大,所以area= small时的AP和AR都为-1.000
最后,保存模型:
torch.save(model, r'保存路径\modelname.pkl')
6. 查看效果
用opencv绘制bbox:
def showbbox(model, img): # 输入的img是0-1范围的tensor model.eval() with torch.no_grad(): ''' prediction形如: [{'boxes': tensor([[1492.6672, 238.4670, 1765.5385, 315.0320], [ 887.1390, 256.8106, 1154.6687, 330.2953]], device='cuda:0'), 'labels': tensor([1, 1], device='cuda:0'), 'scores': tensor([1.0000, 1.0000], device='cuda:0')}] ''' prediction = model([img.to(device)]) print(prediction) img = img.permute(1,2,0) # C,H,W → H,W,C,用来画图 img = (img * 255).byte().data.cpu() # * 255,float转0-255 img = np.array(img) # tensor → ndarray for i in range(prediction[0]['boxes'].cpu().shape[0]): xmin = round(prediction[0]['boxes'][i][0].item()) ymin = round(prediction[0]['boxes'][i][1].item()) xmax = round(prediction[0]['boxes'][i][2].item()) ymax = round(prediction[0]['boxes'][i][3].item()) label = prediction[0]['labels'][i].item() if label == 1: cv2.rectangle(img, (xmin, ymin), (xmax, ymax), (255, 0, 0), thickness=2) cv2.putText(img, 'mark_type_1', (xmin, ymin), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 0, 0), thickness=2) elif label == 2: cv2.rectangle(img, (xmin, ymin), (xmax, ymax), (0, 255, 0), thickness=2) cv2.putText(img, 'mark_type_2', (xmin, ymin), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), thickness=2) plt.figure(figsize=(20,15)) plt.imshow(img)
查看效果:
model = torch.load(r'保存路径\modelname.pkl') device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') model.to(device) img, _ = dataset_test[0] showbbox(model, img)