目标检测——SSD编码真实框
既然要依靠先验框来回归真实框,要确定两个问题,用哪个先验框来回归真实框,如何回归真实框
1. 用哪个先验框回归真实框
哪个先验框与真实框接近,就用哪个先验框来回归真实框,我们用iou来衡量接近,一般取阈值为0.5,
如果先验框与真实框iou大于0.5,我们用这个先验框来回归这个真实框,准确来说时这些先验框
iou = self.iou(box)
encoded_box = np.zeros((self.num_priors, 4 + return_iou))
# 找到每一个真实框,重合程度较高的先验框
assign_mask = iou > self.overlap_threshold
if not assign_mask.any():
assign_mask[iou.argmax()] = True
if return_iou:
encoded_box[:, -1][assign_mask] = iou[assign_mask]
# 找到对应的先验框
assigned_priors = self.priors[assign_mask]
2. 如何回归真实框
对于中心点坐标。我们回归真实框和先验框的偏移,对于长宽,我们回归真实框和先验框的缩放比例
assigned_priors = self.priors[assign_mask]
# 逆向编码,将真实框转化为ssd预测结果的格式
# 先计算真实框的中心与长宽
box_center = 0.5 * (box[:2] + box[2:])
box_wh = box[2:] - box[:2]
# 再计算重合度较高的先验框的中心与长宽
assigned_priors_center = 0.5 * (assigned_priors[:, :2] +
assigned_priors[:, 2:4])
assigned_priors_wh = (assigned_priors[:, 2:4] -
assigned_priors[:, :2])
# 逆向求取ssd应该有的预测结果
# 此时真实框和先验框都是相对于input_shape 大小,取值在(-1-1)
encoded_box[:, :2][assign_mask] = box_center - assigned_priors_center
# 相对于先验框,大框容忍度大,小框容忍度小
encoded_box[:, :2][assign_mask] /= assigned_priors_wh
# 除以0.1(扩大10倍)。处于(0-1)之间,归一化特征
encoded_box[:, :2][assign_mask] /= assigned_priors[:, -4:-2]
# 取log,
encoded_box[:, 2:4][assign_mask] = np.log(box_wh / assigned_priors_wh)
# 除以0.2
encoded_box[:, 2:4][assign_mask] /= assigned_priors[:, -2:]
# (8732*5,)
return encoded_box.ravel()
一张图像往往有多个目标,所以有多个实际box,所以对一张图像所有的框进行编码
encoded_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4])
# 每一个真实框的编码后的值,和iou
encoded_boxes = encoded_boxes.reshape(-1, self.num_priors, 5)
def assign_boxes(self, boxes): assignment = np.zeros((self.num_priors, 4 + self.num_classes + 8)) assignment[:, 4] = 1.0 if len(boxes) == 0: return assignment # 对每一个真实框都进行iou计算 encoded_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4]) # 每一个真实框的编码后的值,和iou encoded_boxes = encoded_boxes.reshape(-1, self.num_priors, 5) # 取重合程度最大的先验框,并且获取这个先验框的index
#(8732) best_iou = encoded_boxes[:, :, -1].max(axis=0)
#idx:第几个框
# (8732) best_iou_idx = encoded_boxes[:, :, -1].argmax(axis=0) best_iou_mask = best_iou > 0
# (num of prior_iou>0) best_iou_idx = best_iou_idx[best_iou_mask] assign_num = len(best_iou_idx) # 保留重合程度最大的先验框的应该有的预测结果 encoded_boxes = encoded_boxes[:, best_iou_mask, :] assignment[:, :4][best_iou_mask] = encoded_boxes[best_iou_idx,np.arange(assign_num),:4] # 4代表为背景的概率,为0 assignment[:, 4][best_iou_mask] = 0 assignment[:, 5:-8][best_iou_mask] = boxes[best_iou_idx, 4:] assignment[:, -8][best_iou_mask] = 1 # 通过assign_boxes我们就获得了,输入进来的这张图片,应该有的预测结果是什么样子的 return assignment
最后得到的编码框维度为(8732, 4 + num_classes + 8)
4:代表下x, y, h, w的偏移值
num_classes: 类别数,加背景类,如VOC为21
8:后8位的第以1 代表是否有目标,有目标位1, 无目标位0;其余位为0
3.数据前处理
对于图像,进行图像增强,对于框,进行编码
def generate(self, train=True): while True: if train: # 打乱 shuffle(self.train_lines) lines = self.train_lines else: shuffle(self.val_lines) lines = self.val_lines inputs = [] targets = [] for annotation_line in lines: img,y=self.get_random_data(annotation_line,self.image_size[0:2]) if len(y)!=0: boxes = np.array(y[:,:4],dtype=np.float32) boxes[:,0] = boxes[:,0]/self.image_size[1] boxes[:,1] = boxes[:,1]/self.image_size[0] boxes[:,2] = boxes[:,2]/self.image_size[1] boxes[:,3] = boxes[:,3]/self.image_size[0] one_hot_label = np.eye(self.num_classes)[np.array(y[:,4],np.int32)] if ((boxes[:,3]-boxes[:,1])<=0).any() and ((boxes[:,2]-boxes[:,0])<=0).any(): continue y = np.concatenate([boxes,one_hot_label],axis=-1) y = self.bbox_util.assign_boxes(y) inputs.append(img) targets.append(y) if len(targets) == self.batch_size: tmp_inp = np.array(inputs) tmp_targets = np.array(targets) inputs = [] targets = [] yield preprocess_input(tmp_inp), tmp_targets