视频目标检测跟踪--Detect to Track and Track to Detect

O天涯海阁O

9597人浏览 · 2017-10-13 14:47:06

O天涯海阁O · 2017-10-13 14:47:06 发布

Detect to Track and Track to Detect
ICCV2017
https://github.com/feichtenhofer/detect-track

本文针对视频目标检测问题提出一个统一的框架同时完成检测和跟踪
In this paper we propose a unified approach to tackle the problem of object detection in realistic video

ImageNet video object detection challenge (VID) 这个竞赛目前影响力是比较大

视频目标检测难度比较大，主要有以下几个原因：
(i) size 视频的数据量比较大 VID has around 1.3M images, compared to around 400K in DET or 100K in COCO
(ii)motion blur: 因为相机或物体运动导致的图像运动模糊 due to rapid camera or object motion
(iii) quality 网络视频的质量是参差不齐的
(iv) partial occlusion 有时遮挡是比较严重的
(v) pose: unconventional object-to-camera poses are frequently seen in video 姿态的多样性

这里写图片描述

3 D&T Approach Detect and Track (D&T)
3.1. D&T overview
We aim at jointly detecting and tracking (D&T) objects in video
我们是基于 R-FCN 检测框架，extend it for multi-frame detection and tracking

总体网络结构如下所示：
这里写图片描述

最大的亮点是提出了一个 RoI Tracking，这个模块将两帧直接的物体关联起来，完成物体跟踪
We compute correlation maps for all positions in a feature map and let RoI pooling operate on these feature maps for track regression
这里写图片描述