既然要依靠先验框来回归真实框,要确定两个问题,用哪个先验框来回归真实框,如何回归真实框

1. 用哪个先验框回归真实框

哪个先验框与真实框接近,就用哪个先验框来回归真实框,我们用iou来衡量接近,一般取阈值为0.5,

如果先验框与真实框iou大于0.5,我们用这个先验框来回归这个真实框,准确来说时这些先验框

iou = self.iou(box)
encoded_box = np.zeros((self.num_priors, 4 + return_iou))

# 找到每一个真实框,重合程度较高的先验框
assign_mask = iou > self.overlap_threshold
if not assign_mask.any():
assign_mask[iou.argmax()] = True
if return_iou:
encoded_box[:, -1][assign_mask] = iou[assign_mask]

# 找到对应的先验框
assigned_priors = self.priors[assign_mask]

2. 如何回归真实框

对于中心点坐标。我们回归真实框和先验框的偏移,对于长宽,我们回归真实框和先验框的缩放比例

assigned_priors = self.priors[assign_mask]
# 逆向编码,将真实框转化为ssd预测结果的格式

# 先计算真实框的中心与长宽
box_center = 0.5 * (box[:2] + box[2:])
box_wh = box[2:] - box[:2]
# 再计算重合度较高的先验框的中心与长宽
assigned_priors_center = 0.5 * (assigned_priors[:, :2] +
assigned_priors[:, 2:4])
assigned_priors_wh = (assigned_priors[:, 2:4] -
assigned_priors[:, :2])

# 逆向求取ssd应该有的预测结果
# 此时真实框和先验框都是相对于input_shape 大小,取值在(-1-1)
encoded_box[:, :2][assign_mask] = box_center - assigned_priors_center

# 相对于先验框,大框容忍度大,小框容忍度小
encoded_box[:, :2][assign_mask] /= assigned_priors_wh

# 除以0.1(扩大10倍)。处于(0-1)之间,归一化特征
encoded_box[:, :2][assign_mask] /= assigned_priors[:, -4:-2]

# 取log,
encoded_box[:, 2:4][assign_mask] = np.log(box_wh / assigned_priors_wh)
# 除以0.2
encoded_box[:, 2:4][assign_mask] /= assigned_priors[:, -2:]
# (8732*5,)
return encoded_box.ravel()

一张图像往往有多个目标,所以有多个实际box,所以对一张图像所有的框进行编码

encoded_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4])
# 每一个真实框的编码后的值,和iou
encoded_boxes = encoded_boxes.reshape(-1, self.num_priors, 5)

    def assign_boxes(self, boxes):
        assignment = np.zeros((self.num_priors, 4 + self.num_classes + 8))
        assignment[:, 4] = 1.0
        if len(boxes) == 0:
            return assignment
        # 对每一个真实框都进行iou计算
        encoded_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4])
        # 每一个真实框的编码后的值,和iou
        encoded_boxes = encoded_boxes.reshape(-1, self.num_priors, 5)
        
        # 取重合程度最大的先验框,并且获取这个先验框的index
      #(8732) best_iou = encoded_boxes[:, :, -1].max(axis=0)

#idx:第几个框
      # (8732) best_iou_idx
= encoded_boxes[:, :, -1].argmax(axis=0) best_iou_mask = best_iou > 0

      # (num of prior_iou>0) best_iou_idx
= best_iou_idx[best_iou_mask] assign_num = len(best_iou_idx) # 保留重合程度最大的先验框的应该有的预测结果 encoded_boxes = encoded_boxes[:, best_iou_mask, :] assignment[:, :4][best_iou_mask] = encoded_boxes[best_iou_idx,np.arange(assign_num),:4] # 4代表为背景的概率,为0 assignment[:, 4][best_iou_mask] = 0 assignment[:, 5:-8][best_iou_mask] = boxes[best_iou_idx, 4:] assignment[:, -8][best_iou_mask] = 1 # 通过assign_boxes我们就获得了,输入进来的这张图片,应该有的预测结果是什么样子的 return assignment

最后得到的编码框维度为(8732, 4 + num_classes + 8)

4:代表下x, y, h, w的偏移值

num_classes: 类别数,加背景类,如VOC为21 

8:后8位的第以1 代表是否有目标,有目标位1, 无目标位0;其余位为0

 

3.数据前处理

对于图像,进行图像增强,对于框,进行编码

    def generate(self, train=True):
        while True:
            if train:
                # 打乱
                shuffle(self.train_lines)
                lines = self.train_lines
            else:
                shuffle(self.val_lines)
                lines = self.val_lines
            inputs = []
            targets = []
            for annotation_line in lines:  
                img,y=self.get_random_data(annotation_line,self.image_size[0:2])
                if len(y)!=0:
                    boxes = np.array(y[:,:4],dtype=np.float32)
                    boxes[:,0] = boxes[:,0]/self.image_size[1]
                    boxes[:,1] = boxes[:,1]/self.image_size[0]
                    boxes[:,2] = boxes[:,2]/self.image_size[1]
                    boxes[:,3] = boxes[:,3]/self.image_size[0]
                    one_hot_label = np.eye(self.num_classes)[np.array(y[:,4],np.int32)]
                    if ((boxes[:,3]-boxes[:,1])<=0).any() and ((boxes[:,2]-boxes[:,0])<=0).any():
                        continue
                    
                    y = np.concatenate([boxes,one_hot_label],axis=-1)

                y = self.bbox_util.assign_boxes(y)
                inputs.append(img)               
                targets.append(y)
                if len(targets) == self.batch_size:
                    tmp_inp = np.array(inputs)
                    tmp_targets = np.array(targets)
                    inputs = []
                    targets = []
                    yield preprocess_input(tmp_inp), tmp_targets

 





版权声明:本文为learningcaiji原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://www.cnblogs.com/learningcaiji/p/14136312.html