【数据集】计算机视觉，深度学习，数据挖掘数据集整理

金融美国劳工部统计局官方发布数据上证A股日线数据，1999.12.09 至 2016.06.08，前复权，1095支股票深证A股日线数据，1999.12.09 至 2016.06.08，前复权，1766支股票深证创业板日线数据，1999.12.09 至 2016.06.08，前复权，510支股票MT4平台外汇交易历史数据Forex平台外汇交易历史数据几组外汇交易逐笔（Ticks）数据美国股票新

ciky奇

20823人浏览 · 2018-04-04 10:46:09

ciky奇 · 2018-04-04 10:46:09 发布

金融

交通

商业

推荐系统

医疗健康

图像数据

综合图像

场景图像

Web标签图像

人形轮廓图像

视觉文字识别图像

特定一类事物图像

材质纹理图像

物体分类图像

人脸图像

姿势动作图像

指纹识别

其它图像数据

视频数据

综合视频

人类动作视频

目标检测视频

密集人群视频

其它视频

Fire Detection 视频数据

音频数据

综合音频

Google Audioset 音频数据【数据太大仅有介绍】

语音识别

自然语言处理

社会数据

处理后的科研和竞赛数据

1.深度学习常用数据集

2、［导读］ “大数据时代”，数据为王！无论是数据挖掘还是目前大热的深度学习领域都离不开“大数据”。大公司们一般会有自己的数据，但对于创业公司或是高校老师、学生来说，“Where can I get large datasets open to the public?”是不得不面对的一个问题。

本文结合笔者在研究生学习、科研期间使用过以及阅读文献了解到的深度学习视觉领域常用的开源数据集，进行介绍和汇总。

MNIST

深度学习领域的“Hello World!”，入门必备！MNIST是一个手写数字数据库，它有60000个训练样本集和10000个测试样本集，每个样本图像的宽高为28*28。此数据集是以二进制存储的，不能直接以图像格式查看，不过很容易找到将其转换成图像格式的工具。

最早的深度卷积网络LeNet便是针对此数据集的，当前主流深度学习框架几乎无一例外将MNIST数据集的处理作为介绍及入门第一教程，其中Tensorflow关于MNIST的教程非常详细。

数据集大小：~12MB
下载地址：
http://yann.lecun.com/exdb/mnist/index.html

Imagenet

MNIST将初学者领进了深度学习领域，而Imagenet数据集对深度学习的浪潮起了巨大的推动作用。深度学习领域大牛Hinton在2012年发表的论文《ImageNet Classification with Deep Convolutional Neural Networks》在计算机视觉领域带来了一场“革命”，此论文的工作正是基于Imagenet数据集。

Imagenet数据集有1400多万幅图片，涵盖2万多个类别；其中有超过百万的图片有明确的类别标注和图像中物体位置的标注，具体信息如下：
1）Total number of non-empty synsets: 21841
2）Total number of images: 14,197,122
3）Number of images with bounding box annotations: 1,034,908
4）Number of synsets with SIFT features: 1000
5）Number of images with SIFT features: 1.2 million

Imagenet数据集是目前深度学习图像领域应用得非常多的一个领域，关于图像分类、定位、检测等研究工作大多基于此数据集展开。Imagenet数据集文档详细，有专门的团队维护，使用非常方便，在计算机视觉领域研究论文中应用非常广，几乎成为了目前深度学习图像领域算法性能检验的“标准”数据集。

与Imagenet数据集对应的有一个享誉全球的“ImageNet国际计算机视觉挑战赛(ILSVRC)”，以往一般是google、MSRA等大公司夺得冠军，今年（2016）ILSVRC2016中国团队包揽全部项目的冠军。

Imagenet数据集是一个非常优秀的数据集，但是标注难免会有错误，几乎每年都会对错误的数据进行修正或是删除，建议下载最新数据集并关注数据集更新。

数据集大小：~1TB（ILSVRC2016比赛全部数据）
下载地址：
http://www.image-net.org/about-stats

COCO

COCO(Common Objects in Context)是一个新的图像识别、分割和图像语义数据集，它有如下特点：
1）Object segmentation
2）Recognition in Context
3）Multiple objects per image
4）More than 300,000 images
5）More than 2 Million instances
6）80 object categories
7）5 captions per image
8）Keypoints on 100,000 people

COCO数据集由微软赞助，其对于图像的标注信息不仅有类别、位置信息，还有对图像的语义文本描述，COCO数据集的开源使得近两三年来图像分割语义理解取得了巨大的进展，也几乎成为了图像语义理解算法性能评价的“标准”数据集。

Google开源的开源了图说生成模型show and tell就是在此数据集上测试的，想玩的可以下下来试试哈。

数据集大小：~40GB
下载地址：http://mscoco.org/

PASCAL VOC

PASCAL VOC挑战赛是视觉对象的分类识别和检测的一个基准测试，提供了检测算法和学习性能的标准图像注释数据集和标准的评估系统。PASCAL VOC图片集包括20个目录：人类；动物（鸟、猫、牛、狗、马、羊）；交通工具（飞机、自行车、船、公共汽车、小轿车、摩托车、火车）；室内（瓶子、椅子、餐桌、盆栽植物、沙发、电视）。PASCAL VOC挑战赛在2012年后便不再举办，但其数据集图像质量好，标注完备，非常适合用来测试算法性能。

数据集大小：~2GB
下载地址：
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html

CIFAR

CIFAR-10包含10个类别，50,000个训练图像，彩色图像大小：32x32，10,000个测试图像。CIFAR-100与CIFAR-10类似，包含100个类，每类有600张图片，其中500张用于训练，100张用于测试；这100个类分组成20个超类。图像类别均有明确标注。CIFAR对于图像分类算法测试来说是一个非常不错的中小规模数据集。

数据集大小：~170MB
下载地址：
http://www.cs.toronto.edu/~kriz/cifar.html

Open Image

过去几年机器学习的发展使得计算机视觉有了快速的进步，系统能够自动描述图片，对共享的图片创造自然语言回应。其中大部分的进展都可归因于 ImageNet 、COCO这样的数据集的公开使用。谷歌作为一家伟大的公司，自然也要做出些表示，于是乎就有了Open Image。

Open Image是一个包含~900万张图像URL的数据集，里面的图片通过标签注释被分为6000多类。该数据集中的标签要比ImageNet（1000类）包含更真实生活的实体存在，它足够让我们从头开始训练深度神经网络。

谷歌出品，必属精品！唯一不足的可能就是它只是提供图片URL，使用起来可能不如直接提供图片方便。

此数据集，笔者也未使用过，不过google出的东西质量应该还是有保障的。

数据集大小：~1.5GB（不包括图片）
下载地址：
https://github.com/openimages/dataset

Youtube-8M

Youtube-8M为谷歌开源的视频数据集，视频来自youtube，共计8百万个视频，总时长50万小时，4800类。为了保证标签视频数据库的稳定性和质量，谷歌只采用浏览量超过1000的公共视频资源。为了让受计算机资源所限的研究者和学生也可以用上这一数据库，谷歌对视频进行了预处理，并提取了帧级别的特征，提取的特征被压缩到可以放到一个硬盘中（小于1.5T）。

此数据集的下载提供下载脚本，由于国内网络的特殊原因，下载此数据经常断掉，不过还好下载脚本有续传功能，过一会儿重新连接就能再连上。可以写一个脚本检测到下载中断后就sleep一段时间然后再重新请求下载，这样就不用一直守着了。（截至发文，断断续续的下载，笔者表示还没下完呢……）

数据集大小：~1.5TB
下载地址：https://research.google.com/youtube8m/

以上是笔者根据学习科研和文献阅读经历总结的目前深度学习视觉领域研究人员常用数据集。由于个人学识有限，难免有疏漏和不当的地方，望读者朋友们不吝赐教。

如果以上数据集还不能满足你的需求的话，不妨从下面找找吧。

1.深度学习数据集收集网站

http://deeplearning.net/datasets/**
收集大量的各深度学习相关的数据集，但并不是所有开源的数据集都能在上面找到相关信息。

2、Tiny Images Dataset
http://horatio.cs.nyu.edu/mit/tiny/data/index.html
包含8000万的32x32图像，CIFAR-10和CIFAR-100便是从中挑选的。

3、CoPhIR
http://cophir.isti.cnr.it/whatis.html
雅虎发布的超大Flickr数据集，包含1亿多张图片。

4、MirFlickr1M
http://press.liacs.nl/mirflickr/Flickr数据集中挑选出的100万图像集。

5、SBU captioned photo dataset
http://dsl1.cewit.stonybrook.edu/~vicente/sbucaptions/Flickr的一个子集，包含100万的图像集。

6、NUS-WIDE
http://lms.comp.nus.edu.sg/research/NUS-WIDE.htmFlickr中的27万的图像集。

7、Large-Scale Image Annotation using Visual Synset(ICCV 2011)
http://cpl.cc.gatech.edu/projects/VisualSynset/机器标注的一个超大规模数据集，包含2亿图像。

8、SUN dataset
http://people.csail.mit.edu/jxiao/SUN/包含13万的图像的数据集。

9、MSRA-MM
http://research.microsoft.com/en-us/projects/msrammdata/ 包含100万的图像，23000视频；微软亚洲研究院出品，质量应该有保障。

中国是一个“数据大国”，中国的数据开放在政府部门以北京、上海等地为首，陆续开放了交通、天气等数据集；在企业中以新浪微博等为首，开放了真实、有效的数据给研究人员提供了极大的便利；但就计算机视觉领域来说，国内数据集的开放水平和国外相比仍有一定差距。希望国内相关企业和组织能够开放更多优秀的数据集，促进相关行业研究进展，提升中国在相关研究领域的影响力，为推动全人类科学技术的进步贡献自己的一份力量。

常用图像数据集大全

1.搜狗实验室数据集：

http://www.sogou.com/labs/dl/p.html

互联网图片库来自sogou图片搜索所索引的部分数据。其中收集了包括人物、动物、建筑、机械、风景、运动等类别，总数高达2,836,535张图片。对于每张图片，数据集中给出了图片的原图、缩略图、所在网页以及所在网页中的相关文本。200多G

http://www.imageclef.org/

IMAGECLEF致力于位图片相关领域提供一个基准（检索、分类、标注等等） Cross Language Evaluation Forum (CLEF) 。从2003年开始每年举行一次比赛.

http://staff.science.uva.nl/~xirong/index.php?n=Main.Dataset

Xiaorong Li 维护的数据集。PhD ,Intelligent Systems Lab Amsterdam.research on video and image retrieval.

Flickr-3.5M: A collection of 3.5 million social-tagged images.
Social20: A ground-truth set for tag-based social image retrieval.
Biconcepts2012test: A ground-truth set for retrieving bi-concepts (concept pairs) in unlabeled images.
neg4free: A set of negative examples automatically harvested from social-tagged images for 20 PASCAL VOC concepts.

wikipedia featured articles 函数图片（以及特征）以及对应的wiki文本。可以看看文章A New Approach to Cross-Modal Multimedia Retrieval，还有一批文章On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval不过还没有下载链接

http://www.svcl.ucsd.edu/projects/crossmodal/

http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm

To our knowledge, this is the largest real-world web image dataset comprising over 269,000 images with over 5,000 user-provided tags, and ground-truth of 81 concepts for the entire dataset. The dataset is much larger than the popularly available Corel and Caltech 101 datasets. Though some datasets comprise over 3 million images, they only have ground-truth for a small fraction of images. Our proposed NUS-WIDE dataset has the ground-truth for the entire dataset.

http://www.cs.washington.edu/research/imagedatabase/

http://lear.inrialpes.fr/~jegou/data.php

Jegou的数据集，不过Jegou是专门做CBIR的，图像有ground truth，没有标注。

http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/

vgg的osford building dataset。也是专门CBIR的数据。

http://acmmm13.org/submissions/call-for-multimedia-grand-challenge-solutions/msr-bing-grand-challenge-on-image-retrieval-scientific-track/

The dataset for the Microsoft Image Grand Challenge on Image Retrieval

另外介绍cvpaper上的整理的数据集

http://www.cvpapers.com/index.html

Participate in Reproducible Research

Detection

PASCAL VOC 2009 dataset

Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets

LabelMe dataset

LabelMe is a web-based image annotation tool that allows researchers to label images and share the annotations with the rest of the community. If you use the database, we only ask that you contribute to it, from time to time, by using the labeling tool.

BioID Face Detection Database

1521 images with human faces, recorded under natural conditions, i.e. varying illumination and complex background. The eye positions have been set manually.

CMU/VASC & PIE Face dataset

Yale Face dataset

Caltech

Cars, Motorcycles, Airplanes, Faces, Leaves, Backgrounds

Caltech 101

Pictures of objects belonging to 101 categories

Caltech 256

Pictures of objects belonging to 256 categories

Daimler Pedestrian Detection Benchmark

15,560 pedestrian and non-pedestrian samples (image cut-outs) and 6744 additional full images not containing pedestrians for bootstrapping. The test set contains more than 21,790 images with 56,492 pedestrian labels (fully visible or partially occluded), captured from a vehicle in urban traffic.

MIT Pedestrian dataset

CVC Pedestrian Datasets

CBCL Pedestrian Database

MIT Face dataset

CBCL Face Database

MIT Car dataset

CBCL Car Database

MIT Street dataset

CBCL Street Database

INRIA Person Data Set

A large set of marked up images of standing or walking people

INRIA car dataset

A set of car and non-car images taken in a parking lot nearby INRIA

INRIA horse dataset

A set of horse and non-horse images

H3D Dataset

3D skeletons and segmented regions for 1000 people in images

HRI RoadTraffic dataset

A large-scale vehicle detection dataset

BelgaLogos

10000 images of natural scenes, with 37 different logos, and 2695 logos instances, annotated with a bounding box.

FlickrBelgaLogos

10000 images of natural scenes grabbed on Flickr, with 2695 logos instances cut and pasted from the BelgaLogos dataset.

FlickrLogos-32

The dataset FlickrLogos-32 contains photos depicting logos and is meant for the evaluation of multi-class logo detection/recognition as well as logo retrieval methods on real-world images. It consists of 8240 images downloaded from Flickr.

TME Motorway Dataset

30000+ frames with vehicle rear annotation and classification (car and trucks) on motorway/highway sequences. Annotation semi-automatically generated using laser-scanner data. Distance estimation and consistent target ID over time available.

PHOS (Color Image Database for illumination invariant feature selection)

Phos is a color image database of 15 scenes captured under different illumination conditions. More particularly, every scene of the database contains 15 different images: 9 images captured under various strengths of uniform illumination, and 6 images under different degrees of non-uniform illumination. The images contain objects of different shape, color and texture and can be used for illumination invariant feature detection and selection.

CaliforniaND: An Annotated Dataset For Near-Duplicate Detection In Personal Photo Collections

California-ND contains 701 photos taken directly from a real user's personal photo collection, including many challenging non-identical near-duplicate cases, without the use of artificial image transformations. The dataset is annotated by 10 different subjects, including the photographer, regarding near duplicates.

Classification

PASCAL VOC 2009 dataset

Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets

Caltech

Cars, Motorcycles, Airplanes, Faces, Leaves, Backgrounds

Caltech 101

Pictures of objects belonging to 101 categories

Caltech 256

Pictures of objects belonging to 256 categories

ETHZ Shape Classes

A dataset for testing object class detection algorithms. It contains 255 test images and features five diverse shape-based classes (apple logos, bottles, giraffes, mugs, and swans).

Flower classification data sets

17 Flower Category Dataset

Animals with attributes

A dataset for Attribute Based Classification. It consists of 30475 images of 50 animals classes with six pre-extracted feature representations for each image.

Stanford Dogs Dataset

Dataset of 20,580 images of 120 dog breeds with bounding-box annotation, for fine-grained image categorization.

Recognition

Face and Gesture Recognition Working Group FGnet

Feret

Face and Gesture Recognition Working Group FGnet

PUT face

9971 images of 100 people

Labeled Faces in the Wild

A database of face photographs designed for studying the problem of unconstrained face recognition

Urban scene recognition

Traffic Lights Recognition, Lara's public benchmarks.

PubFig: Public Figures Face Database

The PubFig database is a large, real-world face dataset consisting of 58,797 images of 200 people collected from the internet. Unlike most other existing face datasets, these images are taken in completely uncontrolled situations with non-cooperative subjects.

YouTube Faces

The data set contains 3,425 videos of 1,595 different people. The shortest clip duration is 48 frames, the longest clip is 6,070 frames, and the average length of a video clip is 181.3 frames.

MSRC-12: Kinect gesture data set

The Microsoft Research Cambridge-12 Kinect gesture data set consists of sequences of human movements, represented as body-part locations, and the associated gesture to be recognized by the system.

QMUL underGround Re-IDentification (GRID) Dataset

This dataset contains 250 pedestrian image pairs + 775 additional images captured in a busy underground station for the research on person re-identification.

Person identification in TV series

Face tracks, features and shot boundaries from our latest CVPR 2013 paper. It is obtained from 6 episodes of Buffy the Vampire Slayer and 6 episodes of Big Bang Theory.

ChokePoint Dataset

ChokePoint is a video dataset designed for experiments in person identification/verification under real-world surveillance conditions. The dataset consists of 25 subjects (19 male and 6 female) in portal 1 and 29 subjects (23 male and 6 female) in portal 2.

Tracking

BIWI Walking Pedestrians dataset

Walking pedestrians in busy scenarios from a bird eye view

"Central" Pedestrian Crossing Sequences

Three pedestrian crossing sequences

Pedestrian Mobile Scene Analysis

The set was recorded in Zurich, using a pair of cameras mounted on a mobile platform. It contains 12'298 annotated pedestrians in roughly 2'000 frames.

Head tracking

BMP image sequences.

KIT AIS Dataset

Data sets for tracking vehicles and people in aerial image sequences.

MIT Traffic Data Set

MIT traffic data set is for research on activity analysis and crowded scenes. It includes a traffic video sequence of 90 minutes long. It is recorded by a stationary camera.

Segmentation

Image Segmentation with A Bounding Box Prior dataset

Ground truth database of 50 images with: Data, Segmentation, Labelling - Lasso, Labelling - Rectangle

PASCAL VOC 2009 dataset

Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets

Motion Segmentation and OBJCUT data

Cows for object segmentation, Five video sequences for motion segmentation

Geometric Context Dataset

Geometric Context Dataset: pixel labels for seven geometric classes for 300 images

Crowd Segmentation Dataset

This dataset contains videos of crowds and other high density moving objects. The videos are collected mainly from the BBC Motion Gallery and Getty Images website. The videos are shared only for the research purposes. Please consult the terms and conditions of use of these videos from the respective websites.

CMU-Cornell iCoseg Dataset

Contains hand-labelled pixel annotations for 38 groups of images, each group containing a common foreground. Approximately 17 images per group, 643 images total.

Segmentation evaluation database

200 gray level images along with ground truth segmentations

The Berkeley Segmentation Dataset and Benchmark

Image segmentation and boundary detection. Grayscale and color segmentations for 300 images, the images are divided into a training set of 200 images, and a test set of 100 images.

Weizmann horses

328 side-view color images of horses that were manually segmented. The images were randomly collected from the WWW.

Saliency-based video segmentation with sequentially updated priors

10 videos as inputs, and segmented image sequences as ground-truth

Foreground/Background

Wallflower Dataset

For evaluating background modelling algorithms

Foreground/Background Microsoft Cambridge Dataset

Foreground/Background segmentation and Stereo dataset from Microsoft Cambridge

Stuttgart Artificial Background Subtraction Dataset

The SABS (Stuttgart Artificial Background Subtraction) dataset is an artificial dataset for pixel-wise evaluation of background models.

Saliency Detection (source)

AIM

120 Images / 20 Observers (Neil D. B. Bruce and John K. Tsotsos 2005).

LeMeur

27 Images / 40 Observers (O. Le Meur, P. Le Callet, D. Barba and D. Thoreau 2006).

Kootstra

100 Images / 31 Observers (Kootstra, G., Nederveen, A. and de Boer, B. 2008).

DOVES

101 Images / 29 Observers (van der Linde, I., Rajashekar, U., Bovik, A.C., Cormack, L.K. 2009).

Ehinger

912 Images / 14 Observers (Krista A. Ehinger, Barbara Hidalgo-Sotelo, Antonio Torralba and Aude Oliva 2009).

NUSEF

758 Images / 75 Observers (R. Subramanian, H. Katti, N. Sebe1, M. Kankanhalli and T-S. Chua 2010).

JianLi

235 Images / 19 Observers (Jian Li, Martin D. Levine, Xiangjing An and Hangen He 2011).

Extended Complex Scene Saliency Dataset (ECSSD)

ECSSD contains 1000 natural images with complex foreground or background. For each image, the ground truth mask of salient object(s) is provided.

Video Surveillance

CAVIAR

For the CAVIAR project a number of video clips were recorded acting out the different scenarios of interest. These include people walking alone, meeting with others, window shopping, entering and exitting shops, fighting and passing out and last, but not least, leaving a package in a public place.

ViSOR

ViSOR contains a large set of multimedia data and the corresponding annotations.

Multiview

3D Photography Dataset

Multiview stereo data sets: a set of images

Multi-view Visual Geometry group's data set

Dinosaur, Model House, Corridor, Aerial views, Valbonne Church, Raglan Castle, Kapel sequence

Oxford reconstruction data set (building reconstruction)

Oxford colleges

Multi-View Stereo dataset (Vision Middlebury)

Temple, Dino

Multi-View Stereo for Community Photo Collections

Venus de Milo, Duomo in Pisa, Notre Dame de Paris

IS-3D Data

Dataset provided by Center for Machine Perception

CVLab dataset

CVLab dense multi-view stereo image database

3D Objects on Turntable

Objects viewed from 144 calibrated viewpoints under 3 different lighting conditions

Object Recognition in Probabilistic 3D Scenes

Images from 19 sites collected from a helicopter flying around Providence, RI. USA. The imagery contains approximately a full circle around each site.

Multiple cameras fall dataset

24 scenarios recorded with 8 IP video cameras. The first 22 first scenarios contain a fall and confounding events, the last 2 ones contain only confounding events.

Action

UCF Sports Action Dataset

This dataset consists of a set of actions collected from various sports which are typically featured on broadcast television channels such as the BBC and ESPN. The video sequences were obtained from a wide range of stock footage websites including BBC Motion gallery, and GettyImages.

UCF Aerial Action Dataset

This dataset features video sequences that were obtained using a R/C-controlled blimp equipped with an HD camera mounted on a gimbal.The collection represents a diverse pool of actions featured at different heights and aerial viewpoints. Multiple instances of each action were recorded at different flying altitudes which ranged from 400-450 feet and were performed by different actors.

UCF YouTube Action Dataset

It contains 11 action categories collected from YouTube.

Weizmann action recognition

Walk, Run, Jump, Gallop sideways, Bend, One-hand wave, Two-hands wave, Jump in place, Jumping Jack, Skip.

UCF50

UCF50 is an action recognition dataset with 50 action categories, consisting of realistic videos taken from YouTube.

ASLAN

The Action Similarity Labeling (ASLAN) Challenge.

MSR Action Recognition Datasets

The dataset was captured by a Kinect device. There are 12 dynamic American Sign Language (ASL) gestures, and 10 people. Each person performs each gesture 2-3 times.

KTH Recognition of human actions

Contains six types of human actions (walking, jogging, running, boxing, hand waving and hand clapping) performed several times by 25 subjects in four different scenarios: outdoors, outdoors with scale variation, outdoors with different clothes and indoors.

Hollywood-2 Human Actions and Scenes dataset

Hollywood-2 datset contains 12 classes of human actions and 10 classes of scenes distributed over 3669 video clips and approximately 20.1 hours of video in total.

Collective Activity Dataset

This dataset contains 5 different collective activities : crossing, walking, waiting, talking, and queueing and 44 short video sequences some of which were recorded by consumer hand-held digital camera with varying view point.

Olympic Sports Dataset

The Olympic Sports Dataset contains YouTube videos of athletes practicing different sports.

SDHA 2010

Surveillance-type videos

VIRAT Video Dataset

The dataset is designed to be realistic, natural and challenging for video surveillance domains in terms of its resolution, background clutter, diversity in scenes, and human activity/event categories than existing action recognition datasets.

HMDB: A Large Video Database for Human Motion Recognition

Collected from various sources, mostly from movies, and a small proportion from public databases, YouTube and Google videos. The dataset contains 6849 clips divided into 51 action categories, each containing a minimum of 101 clips.

Stanford 40 Actions Dataset

Dataset of 9,532 images of humans performing 40 different actions, annotated with bounding-boxes.

50Salads dataset

Fully annotated dataset of RGB-D video data and data from accelerometers attached to kitchen objects capturing 25 people preparing two mixed salads each (4.5h of annotated data). Annotated activities correspond to steps in the recipe and include phase (pre-/ core-/ post) and the ingredient acted upon.

Human pose/Expression

AFEW (Acted Facial Expressions In The Wild)/SFEW (Static Facial Expressions In The Wild)

Dynamic temporal facial expressions data corpus consisting of close to real world environment extracted from movies.

ETHZ CALVIN Dataset

Image stitching

IPM Vision Group Image Stitching datasets

Images and parameters for registeration

Medical

VIP Laparoscopic / Endoscopic Dataset

Collection of endoscopic and laparoscopic (mono/stereo) videos and images

Misc

Zurich Buildings Database

ZuBuD Image Database contains over 1005 images about Zurich city building.

Color Name Data Sets

Mall dataset

The mall dataset was collected from a publicly accessible webcam for crowd counting and activity profiling research.

QMUL Junction Dataset

A busy traffic dataset for research on activity analysis and behaviour understanding.

CVOnline的数据集

http://homepages.inf.ed.ac.uk/rbf/CVonline/CVentry.htm

Index by Topic

Action Databases

50 Salads - fully annotated 4.5 hour dataset of RGB-D video + accelerometer data, capturing 25 people preparing two mixed salads each (Dundee University, Sebastian Stein)
ASLAN Action similarity labeling challenge database (Orit Kliper-Gross)
Berkeley MHAD: A Comprehensive Multimodal Human Action Database (Ferda Ofli)
BEHAVE Interacting Person Video Data with markup (Scott Blunsden, Bob Fisher, Aroosha Laghaee)
CVBASE06: annotated sports videos (Janez Pers)
G3D - synchronised video, depth and skeleton data for 20 gaming actions captured with Microsoft Kinect (Victoria Bloom)
Hollywood 3D - 650 3D action recognition in the wild videos, 14 action classes (Simon Hadfield)
Human Actions and Scenes Dataset (Marcin Marszalek, Ivan Laptev, Cordelia Schmid)
HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion (Brown University)
i3DPost Multi-View Human Action Datasets (Hansung Kim)
i-LIDS video event image dataset (Imagery library for intelligent detection systems) (Paul Hosner)
INRIA Xmas Motion Acquisition Sequences (IXMAS) (INRIA)
JPL First-Person Interaction dataset - 7 types of human activity videos taken from a first-person viewpoint (Michael S. Ryoo, JPL)
KTH human action recognition database (KTH CVAP lab)
LIRIS human activities dataset - 2 cameras, annotated, depth images (Christian Wolf, et al)
MuHAVi - Multicamera Human Action Video Data (Hossein Ragheb)
Oxford TV based human interactions (Oxford Visual Geometry Group)
Rochester Activities of Daily Living Dataset (Ross Messing)
SDHA Semantic Description of Human Activities 2010 contest - aerial views (Michael S. Ryoo, J. K. Aggarwal, Amit K. Roy-Chowdhury)
SDHA Semantic Description of Human Activities 2010 contest - Human Interactions (Michael S. Ryoo, J. K. Aggarwal, Amit K. Roy-Chowdhury)
TUM Kitchen Data Set of Everyday Manipulation Activities (Moritz Tenorth, Jan Bandouch)
TV Human Interaction Dataset (Alonso Patron-Perez)
Univ of Central Florida - Feature Films Action Dataset (Univ of Central Florida)
Univ of Central Florida - YouTube Action Dataset (sports) (Univ of Central Florida)
Univ of Central Florida - 50 Action Category Recognition in Realistic Videos (3 GB) (Kishore Reddy)
UCF 101 action dataset 101 action classes, over 13k clips and 27 hours of video data (Univ of Central Florida)
Univ of Central Florida - Sports Action Dataset (Univ of Central Florida)
Univ of Central Florida - ARG Aerial camera, Rooftop camera and Ground camera (UCF Computer Vision Lab)
UCR Videoweb Multi-camera Wide-Area Activities Dataset (Amit K. Roy-Chowdhury)
Verona Social interaction dataset (Marco Cristani)
Videoweb (multicamera) Activities Dataset (B. Bhanu, G. Denina, C. Ding, A. Ivers, A. Kamal, C. Ravishankar, A. Roy-Chowdhury, B. Varda)
ViHASi: Virtual Human Action Silhouette Data (userID: VIHASI password: virtual$virtual) (Hossein Ragheb, Kingston University)
WorkoutSU-10 Kinect dataset for exercise actions (Ceyhun Akgul)
YouCook - 88 open-source YouTube cooking videos with annotations (Jason Corso)
WVU Multi-view action recognition dataset (Univ. of West Virginia)

Biological/Medical

Computed Tomography Emphysema Database (Lauge Sorensen)
Dermoscopy images (Eric Ehrsam)
DIADEM: Digital Reconstruction of Axonal and Dendritic Morphology Competition (Allen Institute for Brain Science et al)
DIARETDB1 - Standard Diabetic Retinopathy Database (Lappeenranta Univ of Technology)
DRIVE: Digital Retinal Images for Vessel Extraction (Univ of Utrecht)
MiniMammographic Database (Mammographic Image Analysis Society)
MIT CBCL Automated Mouse Behavior Recognition datasets (Nicholas Edelman)
Retinal fundus images - Ground truth of vascular bifurcations and crossovers (Univ of Groningen)
Spine and Cardiac data (Digital Imaging Group of London Ontario, Shuo Li)
Univ of Central Florida - DDSM: Digital Database for Screening Mammography (Univ of Central Florida)
VascuSynth - 120 3D vascular tree like structures with ground truth (Mengliu Zhao, Ghassan Hamarneh)
York Cardiac MRI dataset (Alexander Andreopoulos)

Face Databases

3D Mask Attack Database (3DMAD) - 76500 frames of 17 persons using Kinect RGBD with eye positions (Sebastien Marcel)
Audio-visual database for face and speaker recognition (Mobile Biometry MOBIO http://www.mobioproject.org/)
BANCA face and voice database (Univ of Surrey)
Binghampton Univ 3D static and dynamic facial expression database (Lijun Yin, Peter Gerhardstein and teammates)
BioID face database (BioID group)
Biwi 3D Audiovisual Corpus of Affective Communication - 1000 high quality, dynamic 3D scans of faces, recorded while pronouncing a set of English sentences.
CMU Facial Expression Database (CMU/MIT)
CMU/MIT Frontal Faces (CMU/MIT)
CMU/MIT Frontal Faces (CMU/MIT)
CMU Pose, Illumination, and Expression (PIE) Database (Simon Baker)
CSSE Frontal intensity and range images of faces (Ajmal Mian)
Face Recognition Grand Challenge datasets (FRVT - Face Recognition Vendor Test)
FaceTracer Database - 15,000 faces (Neeraj Kumar, P. N. Belhumeur, and S. K. Nayar)
FDDB: Face Detection Data set and Benchmark - studying unconstrained face detection (University of Massachusetts Computer Vision Laboratory)
FG-Net Aging Database of faces at different ages (Face and Gesture Recognition Research Network)
Facial Recognition Technology (FERET) Database (USA National Institute of Standards and Technology)
Hong Kong Face Sketch Database
Japanese Female Facial Expression (JAFFE) Database (Michael J. Lyons)
LFW: Labeled Faces in the Wild - unconstrained face recognition. Re-labeled Faces in the Wild - original images, but aligned using "deep funneling" method. (University of Massachusetts, Amherst)
Manchester Annotated Talking Face Video Dataset (Timothy Cootes)
MIT Collation of Face Databases (Ethan Meyers)
MORPH (Craniofacial Longitudinal Morphological Face Database) (University of North Carolina Wilmington)
MIT CBCL Face Recognition Database (Center for Biological and Computational Learning)
NIST mugshot identification database (USA National Institute of Standards and Technology)
ORL face database: 40 people with 10 views (ATT Cambridge Labs)
Oxford: faces, flowers, multi-view, buildings, object categories, motion segmentation, affine covariant regions, misc (Oxford Visual Geometry Group)
PubFig: Public Figures Face Database (Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, and Shree K. Nayar)
SCface - Surveillance Cameras Face Database (Mislav Grgic, Kresimir Delac, Sonja Grgic, Bozidar Klimpak))
Trondheim Kinect RGB-D Person Re-identification Dataset (Igor Barros Barbosa)
UB KinFace Database - University of Buffalo kinship verification and recognition database
XM2VTS Face video sequences (295): The extended M2VTS Database (XM2VTS) - (Surrey University)
Yale Face Database - 11 expressions of 10 people (A. Georghaides)
Yale Face Database B - 576 viewing conditions of 10 people (A. Georghaides)

Fingerprints

FVC fingerpring verification competition 2002 dataset (University of Bologna)
FVC fingerpring verification competition 2004 dataset (University of Bologna)
FVC - a subset of FVC (Fingerprint Verification Competition) 2002 and 2004 fingerprint image databases, manually extracted minutiae data & associated documents (Umut Uludag)
NIST fingerprint databases (USA National Institute of Standards and Technology)
SPD2010 Fingerprint Singular Points Detection Competition (SPD 2010 committee)

General Images

Aerial color image dataset (Swiss Federal Institute of Technology)
AMOS: Archive of Many Outdoor Scenes (20+m) (Nathan Jacobs)
Brown Univ Large Binary Image Database (Ben Kimia)
Columbia Multispectral Image Database (F. Yasuma, T. Mitsunaga, D. Iso, and S.K. Nayar)
HIPR2 Image Catalogue of different types of images (Bob Fisher et al)
Hyperspectral images of natural scenes - 2002 (David H. Foster)
Hyperspectral images of natural scenes - 2004 (David H. Foster)
ImageNet Linguistically organised (WordNet) Hierarchical Image Database - 10E7 images, 15K categories (Li Fei-Fei, Jia Deng, Hao Su, Kai Li)
ImageNet Large Scale Visual Recognition Challenge (Alex Berg, Jia Deng, Fei-Fei Li)
OTCBVS Thermal Imagery Benchmark Dataset Collection (Ohio State Team)
McGill Calibrated Colour Image Database (Adriana Olmos and Fred Kingdom)
Tiny Images Dataset 79 million 32x32 color images (Fergus, Torralba, Freeman)

Gesture Databases

FG-Net Aging Database of faces at different ages (Face and Gesture Recognition Research Network)
Hand gesture and marine silhouettes (Euripides G.M. Petrakis)
IDIAP Hand pose/gesture datasets (Sebastien Marcel)
Sheffield gesture database - 2160 RGBD hand gesture sequences, 6 subjects, 10 gestures, 3 postures, 3 backgrounds, 2 illuminations (Ling Shao)

Image, Video and Shape Database Retrieval

Brown Univ 25/99/216 Shape Databases (Ben Kimia)
IAPR TC-12 Image Benchmark (Michael Grubinger)
IAPR-TC12 Segmented and annotated image benchmark (SAIAPR TC-12): (Hugo Jair Escalante)
ImageCLEF 2010 Concept Detection and Annotation Task (Stefanie Nowak)
ImageCLEF 2011 Concept Detection and Annotation Task - multi-label classification challenge in Flickr photos
CLEF-IP 2011 evaluation on patent images
McGill 3D Shape Benchmark (Siddiqi, Zhang, Macrini, Shokoufandeh, Bouix, Dickinson)
NIST SHREC 2010 - Shape Retrieval Contest of Non-rigid 3D Models (USA National Institute of Standards and Technology)
NIST SHREC - other NIST retrieval contest databases and links (USA National Institute of Standards and Technology)
NIST TREC Video Retrieval Evaluation Database (USA National Institute of Standards and Technology)
Princeton Shape Benchmark (Princeton Shape Retrieval and Analysis Group)
Queensland cross media dataset - millions of images and text documents for "cross-media" retrieval (Yi Yang)
TOSCA 3D shape database (Bronstein, Bronstein, Kimmel)

Object Databases

2.5D/3D Datasets of various objects and scenes (Ajmal Mian)
Amsterdam Library of Object Images (ALOI): 100K views of 1K objects (University of Amsterdam/Intelligent Sensory Information Systems)
Caltech 101 (now 256) category object recognition database (Li Fei-Fei, Marco Andreeto, Marc'Aurelio Ranzato)
Columbia COIL-100 3D object multiple views (Columbia University)
Densely sampled object views: 2500 views of 2 objects, eg for view-based recognition and modeling (Gabriele Peters, Universiteit Dortmund)
German Traffic Sign Detection Benchmark (Ruhr-Universitat Bochum)
GRAZ-02 Database (Bikes, cars, people) (A. Pinz)
Linkoping 3D Object Pose Estimation Database (Fredrik Viksten and Per-Erik Forssen)
Microsoft Object Class Recognition image databases (Antonio Criminisi, Pushmeet Kohli, Tom Minka, Carsten Rother, Toby Sharp, Jamie Shotton, John Winn)
Microsoft salient object databases (labeled by bounding boxes) (Liu, Sun Zheng, Tang, Shum)
MIT CBCL Car Data (Center for Biological and Computational Learning)
MIT CBCL StreetScenes Challenge Framework: (Stan Bileschi)
NEC Toy animal object recognition or categorization database (Hossein Mobahi)
NORB 50 toy image database (NYU)
PASCAL Image Database (motorbikes, cars, cows) (PASCAL Consortium)
PASCAL 2007 Challange Image Database (motorbikes, cars, cows) (PASCAL Consortium)
PASCAL 2008 Challange Image Database (PASCAL Consortium)
PASCAL 2009 Challange Image Database (PASCAL Consortium)
PASCAL 2010 Challange Image Database (PASCAL Consortium)
PASCAL 2011 Challange Image Database (PASCAL Consortium)
PASCAL 2012 Challange Image Database Category classification, detection, and segmentation, and still-image action classification (PASCAL Consortium)
UIUC Car Image Database (UIUC)
UIUC Dataset of 3D object categories (S. Savarese and L. Fei-Fei)
Venezia 3D object-in-clutter recognition and segmentation (Emanuele Rodola)

People, Pedestrian, Eye/Iris, Template Detection/Tracking Databases

3D KINECT Gender Walking data base (L. Igual, A. Lapedriza, R. Borràs from UB, CVC and UOC, Spain)
Caltech Pedestrian Dataset (P. Dollar, C. Wojek, B. Schiele and P. Perona)
CASIA gait database (Chinese Academy of Sciences)
CASIA-IrisV3 (Chinese Academy of Sciences, T. N. Tan, Z. Sun)
CAVIAR project video sequences with tracking and behavior ground truth (CAVIAR team/Edinburgh University - EC project IST-2001-37540)
Daimler Pedestrian Detection Benchmark 21790 images with 56492 pedestrians plus empty scenes (M. Enzweiler, D. M. Gavrila)
Driver Monitoring Video Dataset (RobeSafe + Jesus Nuevo-Chiquero)
Edinburgh overhead camera person tracking dataset (Bob Fisher, Bashia Majecka, Gurkirt Singh, Rowland Sillito)
Eyetracking database summary (Stefan Winkler)
HAT database of 27 human attributes (Gaurav Sharma, Frederic Jurie)
INRIA Person Dataset (Navneet Dalal)
ISMAR09 ground truth video dataset for template-based (i.e. planar) tracking algorithms (Sebastian Lieberknecht)
MIT CBCL Pedestrian Data (Center for Biological and Computational Learning)
MIT eye tracking database (1003 images) (Judd et al)
Notre Dame Iris Image Dataset (Patrick J. Flynn)
PETS 2009 Crowd Challange dataset (Reading University & James Ferryman)
PETS: Performance Evaluation of Tracking and Surveillance (Reading University & James Ferryman)
PETS Winter 2009 workshop data (Reading University & James Ferryman)
UBIRIS: Noisy Visible Wavelength Iris Image Databases (University of Beira)
Univ of Central Florida - Crowd Dataset (Saad Ali)
Univ of Central Florida - Crowd Flow Segmentation datasets (Saad Ali)
York Univ Eye Tracking Dataset (120 images) (Neil Bruce)

Segmentation

Alpert et al. Segmentation evaluation database (Sharon Alpert, Meirav Galun, Ronen Basri, Achi Brandt)
Berkeley Segmentation Dataset and Benchmark (David Martin and Charless Fowlkes)
GrabCut Image database (C. Rother, V. Kolmogorov, A. Blake, M. Brown)
LabelMe images database and online annotation tool (Bryan Russell, Antonio Torralba, Kevin Murphy, William Freeman)

Surveillance

AVSS07: Advanced Video and Signal based Surveillance 2007 datasets (Andrea Cavallaro)
ETISEO Video Surveillance Download Datasets (INRIA Orion Team and others)
Heriot Watt Summary of datasets for human tracking and surveillance (Zsolt Husz)
SPEVI: Surveillance Performance EValuation Initiative (Queen Mary University London)
Udine Trajectory-based anomalous event detection dataset - synthetic trajectory datasets with outliers (Univ of Udine Artificial Vision and Real Time Systems Laboratory)

Textures

Color texture images by category (textures.forrest.cz)
Columbia-Utrecht Reflectance and Texture Database (Columbia & Utrecht Universities)
DynTex: Dynamic texture database (Renaud Piteri, Mark Huiskes and Sandor Fazekas)
Oulu Texture Database (Oulu University)
Prague Texture Segmentation Data Generator and Benchmark (Mikes, Haindl)
Uppsala texture dataset of surfaces and materials - fabrics, grains, etc.
Vision Texture (MIT Media Lab)

General Videos

Large scale YouTube video dataset - 156,823 videos (2,907,447 keyframes) crawled from YouTube videos (Yi Yang)

Other Collections

CANTATA Video and Image Database Index site (Multitel)
Computer Vision Homepage list of test image databases (Carnegie Mellon Univ)
ETHZ various, including 3D head pose, shape classes, pedestrians, pedestrians, buildings (ETH Zurich, Computer Vision Lab)
Leibe's Collection of people/vehicle/object databases (Bastian Leibe)
Lotus Hill Image Database Collection with Ground Truth (Sealeen Ren, Benjamin Yao, Michael Yang)
Oxford Misc, including Buffy, Flowers, TV characters, Buildings, etc (Oxford Visual geometry Group)
PEIPA Image Database Summary (Pilot European Image Processing Archive)
Univ of Bern databases on handwriting, online documents, string edit and graph matching (Univ of Bern, Computer Vision and Artificial Intelligence)
USC Annotated Computer Vision Bibliography database publication summary (Keith Price)
USC-SIPI image databases: texture, aerial, favorites (eg. Lena) (USC Signal and Image Processing Institute)

Miscellaneous

1. 3D mesh watermarking benchmark dataset (Guillaume Lavoue)
2. Active Appearance Models datasets (Mikkel B. Stegmann)
3. Aircraft tracking (Ajmal Mian)
4. Cambridge Motion-based Segmentation and Recognition Dataset (Brostow, Shotton, Fauqueur, Cipolla)
5. Catadioptric camera calibration images (Yalin Bastanlar)
6. Chars74K dataset - 74 English and Kannada characters (Teo de Campos - t.decampos@surrey.ac.uk)
7. COLD (COsy Localization Database) - place localization (Ullah, Pronobis, Caputo, Luo, and Jensfelt)
8. Columbia Camera Response Functions: Database (DoRF) and Model (EMOR) (M.D. Grossberg and S.K. Nayar)
9. Columbia Database of Contaminants' Patterns and Scattering Parameters (Jinwei Gu, Ravi Ramamoorthi, Peter Belhumeur, Shree Nayar)
10. Dense outdoor correspondence ground truth datasets, for optical flow and local keypoint evaluation (Christoph Strecha)
11. DTU controlled motion and lighting image dataset (135K images) (Henrik Aanaes)
12. EISATS: .enpeda.. Image Sequence Analysis Test Site (Auckland University Multimedia Imaging Group)
13. FlickrLogos-32 - 8240 images of 32 product logos (Stefan Romberg)
14. Flowchart images (Allan Hanbury)
15. Geometric Context - scene interpretation images (Derek Hoiem)
16. Image/video quality assessment database summary (Stefan Winkler)
17. INRIA feature detector evaluation sequences (Krystian Mikolajczyk)
18. INRIA's PERCEPTION's database of images and videos gathered with several synchronized and calibrated cameras (INRIA Rhone-Alpes)
19. INRIA's Synchronized and calibrated binocular/binaural data sets with head movements (INRIA Rhone-Alpes)
20. KITTI dataset for stereo, optical flow and visual odometry (Geiger, Lenz, Urtasun)
21. Large scale 3D point cloud data from terrestrial LiDAR scanning (Andreas Nuechter)
22. Linkoping Rolling Shutter Rectification Dataset (Per-Erik Forssen and Erik Ringaby)
23. Middlebury College stereo vision research datasets (Daniel Scharstein and Richard Szeliski)
24. MPI-Sintel optical flow evaluation dataset (Michael Black)
25. Multiview stereo images with laser based groundtruth (ESAT-PSI/VISICS,FGAN-FOM,EPFL/IC/ISIM/CVLab)
26. The Cancer Imaging Archive (National Cancer Institute)
27. NCI Cancer Image Archive - prostate images (National Cancer Institute)
28. NIST 3D Interest Point Detection (Helin Dutagaci, Afzal Godil)
29. NRCS natural resource/agricultural image database (USDA Natural Resources Conservation Service)
30. Occlusion detection test data (Andrew Stein)
31. The Open Video Project (Gary Marchionini, Barbara M. Wildemuth, Gary Geisler, Yaxiao Song)
32. Pics 'n' Trails - Dataset of Continuously archived GPS and digital photos (Gamhewage Chaminda de Silva)
33. PRINTART: Artistic images of prints of well known paintings, including detail annotations. A benchmark for automatic annotation and retrieval tasks with this database was published at ECCV. (Nuno Miguel Pinho da Silva)
34. RAWSEEDS SLAM benchmark datasets (Rawseeds Project)
35. Robotic 3D Scan Repository - 3D point clouds from robotic experiments of scenes (Osnabruck and Jacobs Universities)
36. ROMA (ROad MArkings) : Image database for the evaluation of road markings extraction algorithms (Jean-Philippe Tarel, et al)
37. Stuttgart Range Image Database - 66 views of 45 objects
38. UCL Ground Truth Optical Flow Dataset (Oisin Mac Aodha)
39. Univ of Genoa Datasets for disparity and optic flow evaluation (Manuela Chessa)
40. Validation and Verification of Neural Network Systems (Francesco Vivarelli)
41. VSD: Technicolor Violent Scenes Dataset - a collection of ground-truth files based on the extraction of violent events in movies
42. WILD: Weather and Illumunation Database (S. Narasimhan, C. Wang. S. Nayar, D. Stolyarov, K. Garg, Y. Schechner, H. Peri)

监控视频相关数据集

http://www.multitel.be/cantata/

BOSS dataset

Website:

Datasets are available here.

Dataset:

The BOSS project aims at developing an innovative and bandwidth efficient communication system to transmit large data rate communications between public transport vehicles and the wayside. In particular, the BOSS concepts will be evaluated and demonstrated in the context of railway transport. As a matter of fact, security issues, traditionally covered in stations by means of video-surveillance are clearly lacking on-board trains, due to the absence of efficient transmission means from the train to a supervising control centre. Similarly, diagnostic or maintenance issues are generally handled when the train arrives in stations or during maintenance stops, which prevents proactive actions to be carried out.

Dataset include 15 sequences shot by 9 cameras and 8 microphones, all synchronized together to give the possibility of 3D video/audio reconstruction.

In these datasets, we can find the following events:

- Cell phone theft (in Spanish language).

- Check out - a passenger checking out another man's wife, then fighting (in French language).

- Disease - a series of 3 passengers fainting, alone in the coach (both in French and Spanish).

- Disease in public (both in French and Spanish).

- Harass - 3 sequences in which a man harasses a woman. In "Harass2", there are other passengers in the coach.

- Newspaper - two sequences (one in French, one in Spanish) in which a passenger harasses another passenger for his newspaper, and end up assaulting him.

- Panic (in French language) - a passenger notices a fire in the next coach, and everybody runs out of the train.

- Two more sequences are provided, containing no incidents whatsoever. They were shot to assess the robustness of incident detection software to false alarms.

- Other sequences are provided, which are not acted incidents but were used for specific incident detection tasks.

Metadata:

Events generated by the BOSS processing are given for some sequences, in a file called "nameofthesequence.xml", in the same directory as the data set of the sequence itself. The format and types of the events are described in a PDF files.

Contextual info:

All the sequences were shot in a Madrid suburban train kindly lent by RENFE who are gratefully acknowledged.
In order to allow as much flexibility as possible, all the video files are uncalibrated, the calibration files are provided along with each sequence and the description of how to use them is given in calibTutorial.pdf . An associated Matlab library is provided in BOSScalibTutorial.zip.

Comments:

Copyrights:

The sequences are provided free of charge for academic research. For any other use, please ask the contact person. Should you care to publish these sequences or results obtained using, please indicate their origin as "BOSS project", and mention the address of the project: http://www.celtic-boss.org.
You are welcome to provide a link to the location of the sequences, but copying them to another web site is subject to prior consent of the contact person.

Contact:

Catherine.LAMY-BERGOT@fr.thalesgroup.com

EMAV 2009

Website:

Datasets are available here:

http://www.emav09.org/

The objective of the EMAV 2009 (European Micro Aerial Vehicle Conference and Flight Competition) conference is to provide an effective and established forum for discussion and dissemination of original and recent advances in MAV technology. The conference program will consist of a theoretical part and a flight competition. We aim for submission of papers that address novel, challenging and innovative ideas, concepts or systems. We particularly encourage papers that go beyond MAV hardware, and address issues such as the collaboration of multiple MAVs, applications of computer vision, and non-GPS based navigation.

Dataset:

For computer vision researchers an image set is published. The set consists of photos taken with various MAV platforms at different locations. The photos are always stills from movies made by the platform. For this EMAV, there is no explicit assignment or competition linked to this data set. However, possible tasks with the data set are: segmentation of the images in meaningful entities, specific object recognition (cars / roads), construction of image mosaics on the basis of the films, etc.

Metadata:

Contextual info:

Comments:

Copyrights:

Contact:

info [-at-] emav2009.org

Caltech Pedestrian Dataset

Website:

Datasets are available here:

http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/

Dataset:

The Caltech Pedestrian Dataset consists of approximately 10 hours of 640x480 30Hz video taken from a vehicle driving through regular traffic in an urban environment. About 250,000 frames (in 137 approximately minute long segments) with a total of 350,000 bounding boxes and 2300 unique pedestrians were annotated.

Metadata:

The annotation includes temporal correspondence between bounding boxes and detailed occlusion labels. More information can be found in our CVPR09 paper.

Associated Matlab code is available. The annotations use a custom "video bounding box" (vbb) file format. The code also contains utilities to view seq files with the annotations overlayed, evaluation routines used to generate all the ROC plots in the paper, and also the vbb labeling tool used to create the dataset (a slightly outdated video tutorial of the labeler is also).

Contextual info:

Comments:

Copyrights:

Contact:

pdollar[at]caltech.edu

NGSIM

Website:

Datasets are available here (registration is needed):

http://ngsim.fhwa.dot.gov/modules.php?op=modload&name=News&file=article&sid=4

Dataset:

Detailed vehicle trajectory data on parts of highways

Metadata:

Contextual info:

Comments:

Copyrights:

Need to register before using the NGSIM Data Sets.

Contact:

John.Halkias@fhwa.dot.gov

AMI Corpora

Website:

Datasets are available here (registration is needed)

http://corpus.amiproject.org/amicorpus/download/download

Dataset:

This dataset consists in meeting room scenarios, with two people sitting around meeting tables

Around two-thirds of the data has been elicited using a scenario in which the participants play different roles in a design team, taking a design project from kick-off to completion over the course of a day. The rest consists of naturally occurring meetings in a range of domains.

Metadata:

Annotations are available for many different phenomena (dialog acts, head movement etc. ).

See here for more information.

Contextual info:

Comments:

Copyrights:

Contact:

machy@multitel.be

MORYNE - Traffic scenes mobile video acquisition

Website:

http://www.fp6-moryne.org/

MORYNE aims at contributing to greater transport efficiency, increased transport safety and more environmental friendly transport by improving traffic management in an urban and sub-urban area.

Dataset:

There are sequences from both demonstration busses of the MORYNE project.
Filenames explicitly provide the date and time of acquisition.

Metadata:

Ground truth is provided in XML format as following:

< event >
< time >2008-01-18T10:05:10.747209< /time >
< name >ODOINFO

     < parameters >
          < sender >OBU< /sender >
          < target >MVS< /target >
          < starttime >2008-01-18T10:05:10.747209< /starttime >
          < stoptime >2008-01-18T10:05:11.784436< /stoptime >
          < distance >9.216714< /distance >
     < /parameters >
< /event >

This file gives the distance covered by the bus during the interval starttime - stoptime.

Contextual info:

.idx files
----------
.idx files contain the date and time for each frame in the sequence. The structure of this file is:

- header of 12 bytes
- For each frame, a structure of 24 bytes

The structure contains:
- unsigned 32 bits integer: seconds since Epoch
- unsigned 32 bits integer: microseconds in the second
- unsigned 64 bits integer: offset in bytes in the .avi file
- unsigned 32 bits integer: frame number starting with 0
- unsigned 32 bits integer: frame type as defined by libavcodec (may be useless)

All integers are encoded in little endian.

Comments:

The material for camera calibration and bus speed/context metadata will be added as soon as possible.

Copyrights:

This folder contains a list of test sequences which have been recorded for the MORYNE project (http://www.fp6-moryne.org).
They can be used for non-commercial purpose only, if a reference to the MORYNE project is associated to their use (e.g. in publications, video demontrations...).

Contact:

christophe.parisot(at)multitel.be

BEHAVE - Crowds

Website:

Datasets are available here:

http://groups.inf.ed.ac.uk/vision/BEHAVEDATA/CROWDS/index.html

Dataset:

These are the smoothed flow sequences for the Waverly train station scene. There are 4 files number. (002) is used for testing, the remaining used for training.

These are the smoothed flow sequences for the train station simulation. There are 30 files divided in the groups below. Use from frame 1100 to 4000. The emergency is at frame 2000.

Group 1: Normal - Training

Group 2: Normal - Testing

Group 3: Emergency - Blocked exit at the bottom of the scene.

Metadata:

No Ground Thruth available

Contextual info:

Comments:

Copyrights:

Free download from website.

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

CANTATA - Left Objects Dataset

Website:

http://www.multitel.be/~va/cantata/LeftObject/

Dataset:

A number of video clips were recorded acting out the scenario of interest: left objects. 31 sequences of two minutes have be recorded, showing different left objects scenarios (1 or more objects, person staying close to the left object, etc).
The 31 scenarios have been recorded using 2 different cameras (not synchronised), with two different views:

- a Panasonic camera - miniDV, model NV-DS28EG (camera1)

- a Sony camera - miniDV, model DSR-PD170P (camera2)

The videos have the following caracteristics:

- A resolution of 720x576 pixels

- 25 frames per second

- A compression using MPEG4

- The file sizes are of 75 Mo for camera1 and 65 Mo for camera2.

Metadata:

All the sequences are annotated using XML format. Each sequence is associated with a ".xml" annotation file with the same name ending by .gt.xml.

For each left object, we can find in the xml:

- the exact time of the detection

- the position of the object in the image

Contextual info:

Comments:

In each sequence, nothing appends before 30 seconds, and after 1m45s.

Copyrights:

Free download from website. If you publish results using the data, please acknowledge the data as coming from the CANTATA project, found at URL: http://www.hitech-projects.com/euprojects/cantata/. THE DATASET IS PROVIDED WITHOUT WARRANTY OF ANY KIND

Contact:

desurmont@multitel.be

VISOR - Surveillance

Website:

Datasets are available here:

http://imagelab.ing.unimore.it/visor/

Dataset:

4 types of video clips. These sequences constitute a representative panel of different video surveillance areas.

They merge indoor and outdoor scenes, such as Indoor Domotic Unimore D.I.I. setup.

Metadata:

Object Detection and Tracking.

Contextual info:

Comments:

Mostly simple videos.

Copyrights:

Free download

Contact:

vezzani.roberto@unimore.it

Traffic datasets from Institut fur Algorithmen und Kognitive Systemes

Website:

Sequences are available here:

http://i21www.ira.uka.de/image_sequences/

Dataset:

Traffic intersection sequence recorded at the Durlacher-Tor-Platz in Karlsruhe by a stationary camera (512 x 512 grayvalue images (GIF-format))
Traffic intersection sequence recorded at the Ettlinger-Tor in Karlsruhe by a stationary camera (512 x 512 grayvalue images (GIF-format))
Traffic intersection sequence recorded at the Nibelungen-Platz in Frankfurt by a stationary camera (720 x 576 grayvalue images (GIF-format))
Traffic sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (740 x 560 grayvalue images (GIF-format))
Another traffic sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (702 x 566 grayvalue images (PM-format))
Traffic sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (768 x 576 grayvalue images (PGM-format),normal conditions)
Traffic sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (768 x 576 grayvalue images (PGM-format),normal conditions)
Traffic sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (768 x 576 color images (PPM-format),heavy fog)
Traffic sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (768 x 576 color images (PPM-format),heavy snowfall)
Traffic sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (768 x 576 color images (PPM-format),snow on lanes)
Traffic sequence showing an intersection at Rheinhafen, Karlsruhe (688 x 565 grayvalue images (PM.GZ-format))
Traffic sequence showing a taxi in Hamburg(256 x 191 grayvalue images (PGM-format))

Metadata:

Camera projection data in the file proj.dat which uses the following format:

tx ty tz	# Translation vector Global <---> Camera Coordinates
r11 r12 r13	# 
r21 r22 r23	#  > 3x3 Rotation Matrix Global <---> Camera
r31 r32 r33	# /
fx		# Focal length x-direction (pixels)
fy		# Focal length y-direction (pixels, usually 4/3 * fx)
x0		# Image Center X (pixels)
y0		# Image Center Y (pixels)
1		# Sharp shadows visible (1=true, 0=false)
phi		# Azimut angle for shadow
theta		# Polar angle for shadow

Contextual info:

Different context, snow, fogs, etc.

Comments:

Copyrights:

license (no), cost (free)

Contact:

Sabri Boughorbel (mailto:cedric.marchessoux@barco.com)

TRAFICON - Traffic jam

Website:

Dataset:

Traffic jam.

Metadata:

Contextual info:

Camera height 12m, Camera: inch sensor, 4 mm lens.

Comments:

Period of road markings is 12m (9+3).

Copyrights:

License (no), cost (free): When dataset is used refer and give credit to Traficon N.V. as follows: " www.traficon.com".

Contact:

Wouter Favoreel, wf@traficon.com

CANDELA - Surveillance

Website:

Datasets are available here:

http://www.multitel.be/~va/candela/

Dataset:

Two different scenarios have been relaized during the CANDELA project : "Indoor abandonned object" and "road intersection".

o Scenario 1: Abandoned object. The detection of abandoned objects is more or less the detection of idle (stationary or non-moving) objects that remain stationary over a certain period of time. The period of time is adjustable. In several types of scenes, idle objects should be detected. In a parking lot e.g., an idle object can be a parked car or a left suitcase. For this scenario we are not looking at the object types "person" or "car", but at unidentified objects, called "unknown objects". An unknown object is any object that is not a person or a vehicle. In general, unknown objects cannot move. What should be detected? : Whenever an unknown object appears in the scene and remains stationary for some amount of time person, an alarm needs to be generated. This alarm must remain active, as long as the unknown object remains stationary.

o Scenario 2: Persons are allowed to cross the street at zebra crossings, a crossing controlled with lights. Alarms should be generated when persons are not allowed to be on the crossing, or when dangerous scenarios occur (cars driving when people crossing). Since the external signal from the traffic light is not available (when the crossing is regulated by traffic lights), detection needs to be done automatically. Detection of persons on the crossing itself is pretty easy, but alarms should only be given when persons are on the crossing, and cars are driving.

Metadata:

Detailed information about data and metadatas can be found here:

http://www.hitech-projects.com/euprojects/candela/pr/scenario_description_document_v06.pdf

Contextual info:

Comments:

Copyrights:

Public domain

Contact:

Xavier Desurmont, desurmont@multitel.be

OVVV - Virtual sequences

Website:

Datasets are available here:

http://development.objectvideo.com/

Dataset:

The ObjectVideo Virtual Video provides the ability to generate virtual video sequences. These video sequences can then be used to test VCA algorithms.

Metadata:

The automatically generated ground truth is generated in a propriety binary format. The format is open, and a conversion program can be created to convert metadata to any format. A simple bounding box scheme is available, for more powerful validation a "blob" video can be created.

Contextual info:

Virtual environment, the user can make his own environment from the internet. Several camera settings can be changed to simulate real-world cameras more closely.

Comments:

This is not a dataset as is but using these tools, very powerful and tailored; test videos can be created.

Copyrights:

The ObjectVideo Virtual Video Tool is provided free for non-commercial use, for your own research and development purposes. If you publish or distribute images, videos or derivative results based on this software, you must acknowledge ObjectVideo by including "ObjectVideo Virtual Video Tool".

To use the ObjectVideo Virtual Video tool a licence for the commercial game Half-Life 2 is needed (www.steampowered.com).

Contact:

Rick Koeleman, VDG-Security bv. rick@vdg-security.com

IBM - Tracking

Website:

http://domino.research.ibm.com/comm/research_projects.nsf/pages/s3.performanceevaluation.html

Dataset:

4 outdoor (from PETS2001) of people and vehicles and 11 indoor clips of people.

Metadata:

Motion detection and motion tracking

Contextual info:

Comments:

Copyrights:

Free download from website

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

SPEVI: Multiple faces dataset

Website:

http://www.spevi.org

Dataset:

This is a dataset for multiple people/faces visual detection and tracking. The dataset is composed of 3 sequences (same scenario); 4 targets repeatedly occlude each other while appearing and disappearing from the field of view of the camera. The sequence motinas_multi_face_frontal shows frontal faces only; in motinas_multi_face_turning the faces are frontal and rotated; in motinas_multi_face_fast the targets move faster that in the previous two sequences. Total number of images: 2769, DivX 6 compression,640 x 480 pixels,25 Hz.

Sensor details
- video camera: JVC GR-20EK

Metadata:

Contextual info:

Comments:

Copyrights:

Requested citation acknowledgment: E. Maggio, E. Piccardo, C. Regazzoni, A. Cavallaro. "Particle PHD filter for multi-target visual tracking", in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), Honolulu (USA), April 15-20, 2007

Contact:

Xavier Desurmont, desurmont@multitel.be

SPEVI: Single face dataset

Website:

http://www.spevi.org

Dataset:

This is a dataset for single person/face visual detection and tracking. The dataset is composed of five sequences with different illumination conditions and resolutions. Three sequences (motinas_toni, motinas_toni_change_ill and motinas_nikola_dark) are shot with a hand held camera (JVC GR-20EK). In motinas_toni the target moves under a constant bright illumination; in motinas_toni_change_ill the illumination changes from dark to bright; the sequence motinas_nikola_dark is constantly dark. Two sequences (motinas_emilio_webcam and motinas_emilio_webcam_turning) are shot with a webcam (Logitech Quickcam) under a fairly constant illumination.Total number of images: 3018, DivX 6 compression, 640 x 480 pixels and 25 Hz (motinas_toni, motinas_toni_change_ill, motinas_nikola_dark), 320 x 240 pixels and 10 Hz (motinas_emilio_webcam and motinas_emilio_webcam_turning)

Metadata:

The ground truth data is available in the .zip files for the sequences motinas_toni and motinas_emilio_webcam. In the ground truth files each line of text describes the objects' position and size in a frame. The syntax of a line is the following: frame number_of_objects obj_1_name x y half_width half_height angle obj_2_name x y half_width half_height angle ...

Contextual info:

Comments:

Copyrights:

Requested citation acknowledgment E. Maggio, A. Cavallaro, "Hybrid particle filter and mean shift tracker with adaptive transition model", in Proc. of IEEE Int. Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), Philadelphia, 19-23 March 2005, pp. 221 - 224.

Contact:

Xavier Desurmont, desurmont@multitel.be

SPEVI: Audiovisual people dataset

Website:

http://www.spevi.org

Dataset:

This is a dataset for uni-modal and multi-modal (audio and visual) people detection tracking. The dataset consists of three sequences recorded in different scenarios with a video camera and two microphones. Two sequences (motinas_Room160 and motinas_Room105) are recorded in rooms with reverberations. The third sequence (motinas_Chamber) is recorded in a room with reduced reverberations. The camera is placed in the centre of a bar that supports two microphones. Total number of images: 3271, Format of images: 8-bit color AVI 360 x 288 pixels 25 fps, audio sampling rate: 44.1 kHz.

Sensor details
- The camera is placed in the centre of a bar that supports two microphones
- Distance between the microphones: 95 cm
- Microphones: Beyerdynamic MCE 530 condenser microphones
- Camera: KOBI KF-31CD analog CCD surveillance camera

Metadata:

The ground truth data are provided together with the sequences in the corresponding .zip file, as list of XML files representing the positions of the objects in the field of view.

Contextual info:

Comments:

Copyrights:

Requested citation acknowledgment Courtesy of EPSRC funded MOTINAS project (EP/D033772/1)

Contact:

Xavier Desurmont, desurmont@multitel.be

ETISEO - Surveillance

Website:

Datasets are available here: (registration is needed)

http://www-sop.inria.fr/orion/ETISEO/

Dataset:

86 video clips. These sequences constitute a representative panel of different video surveillance areas.

They merge indoor and outdoor scenes, corridors, streets, building entries, subway station... They also mix different types of sensors and complexity levels.

Metadata:

5 different levels: Object Detection, Object Localization, Object Tracking, Object Classification.

Contextual info:

Zone of interest, calibration matrix

Comments:

Copyrights:

Free download but registration and user agreement is required.

Contact:

francois.bremond@sophia.inria.fr

SELCAT - Level Crossing

Website:

These datasets have been realized during the SELCAT project.

http://www.levelcrossing.net/

Datasets are available here:

http://www.multitel.be/~va/selcat

Dataset:

These datasets are composed of 24 Hours of real sequences, showing a level crossing where some vehicles stop due to its particular configuration: on the right side of the LC, there is an avenue, parallel to the LC. So a traffic light is located just after the LC. Consequently, sometimes, vehicles stopped on the LC due to this traffic light. The Total Amount of data is about 7 GigaBytes.

Metadata:

For each video files, there is a corresponding ground truth file in XML that gives the timestamp of events "stopped vehicles".

Contextual info:

Environment conditions (calibration, scene...)

Comments:

Copyrights:

Licence, Cost, etc.

Contact:

Caroline Machy, machy@multitel.be

BEHAVE - INTERACTION

Website:

http://groups.inf.ed.ac.uk/vision/BEHAVEDATA/INTERACTIONS/

Dataset:

The dataset comprises of two views of various scenario's of people acting out various interactions. Ten basic scenarios were acted out. These were called InGroup (IG), Approach (A), WalkTogether (WT), Split (S), Ignore (I), Following (FO), Chase (C), Fight (FI), RunTogether (RT), and Meet (M).The data is captured at 25 frames per second. The resolution is 640x480. The videos are available either as AVI's or as a numbered set of JPEG single image files.

Metadata:

Tracking, Event detection.

Contextual info:

3D coordinates of points for calibration purposes provided.

Comments:

The site will be updated when more of the ground truth becomes available.

Copyrights:

Free download from website.

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

PETS - 2007 - REASON

Website:

Datasets ate available here:

http://www.pets2007.net/

Dataset:

The datasets are multisensor sequences containing the following 3 scenarios, with increasing scene complexity: 1. loitering, 2. attended luggage removal (theft), 3. unattended luggage.

Metadata:

Event Detection

Contextual info:

Calibration provided

Comments:

Free download from website . The UK Information Commisioner has agreed that the PETS 2007 datasets described here may be made publicly available for the purposes of academic research. The video sequences are copyright UK EPSRC REASON Project consortium and permission is hereby granted for free download for the purposes of the PETS 2007 workshop.

Copyrights:

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

PETS - 2006 - ISCAPS

Website:

Datasets are available here:

http://www.pets2006.net/

Dataset:

Surveillance of public spaces, detection of left luggage events. Scenarios of increasing complexity, captured using multiple sensors.

Metadata:

All scenarios come with two XML files. The first of these files contains camera calibration parameters, these are given in the sub-directory 'calibration'. See the previous section (Calibration Data) for information on this XML file format. The second XML file (given in the sub-directory 'xml') contains both configuration and ground-truth information.

Contextual info:

Calibration provided.

Comments:

Copyrights:

Free download from website . The UK Information Commisioner has agreed that the PETS 2006 data-sets described here may be made publicly available for the purposes of academic research. The video sequences are copyright ISCAPS consortium and permission is hereby granted for free download for the purposes of the PETS 2006 workshop.

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

PETS - 2005 - WAMOP

Website:

Datasets are available here: (registration is needed)

http://www.vast.uccs.edu/~tboult/PETS05/

Dataset:

Challenging detection/tracking scenes on water.

Metadata:

Object Detection/Tracking.

Contextual info:

Comments:

Copyrights:

Free download from website, but registration is required.

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

PETS - ECCV'2004 - CAVIAR

Website:

http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/

or http://www-prima.inrialpes.fr/PETS04/caviar_data.html

Dataset:

A number of video clips were recorded acting out the different scenarios of interest. These include people walking alone, meeting with others, window shopping, fighting and passing out and last, but not least, leaving a package in a public place. All video clips were filmed with a wide angle camera lens. The resolution is half-resolution PAL standard (384 x 288 pixels, 25 frames per second) and compressed using MPEG2. The file sizes are mostly between 6 and 12 MB, a few up to 21 MB.A number of video clips were recorded acting out the different scenarios of interest. These include people walking alone, meeting with others, window shopping, fighting and passing out and last, but not least, leaving a package in a public place. All video clips were filmed with a wide angle camera lens. The resolution is half-resolution PAL standard (384 x 288 pixels, 25 frames per second) and compressed using MPEG2. The file sizes are mostly between 6 and 12 MB, a few up to 21 MB.

Metadata:

Person/Group Tracking, Person/Group Activity Recognition, Scenario/Situation Recognition

Contextual info:

3D coordinates of points for calibration purposes provided.

Comments:

Copyrights:

Free download from website. If you publish results using the data, please acknowledge the data as coming from the EC Funded CAVIAR project/IST 2001 37540, found at URL:http://www.dai.ed.ac.uk/homes/rbf/CAVIAR/

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

PETS 2002

Website:

Datasets are available here:

http://www.cvg.cs.rdg.ac.uk/PETS2002/pets2002-db.html

Dataset:

Indoor people tracking (and counting). Two training and four testing sequences consist of people moving in front of a shop window. Sequences are provided as both MPEG movie format and as individual JPEG images.

Metadata:

People tracking, counting and activity recognition.

Contextual info:

No calibration provided

Comments:

Copyrights:

Free download from website

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

PETS 2001

Website:

Datasets are available here:

http://www.cvg.cs.rdg.ac.uk/PETS2001/pets2001-dataset.html

http://www.cvg.cs.rdg.ac.uk/cgi-bin/PETSMETRICS/page.cgi?dataset

Dataset:

Outdoor people and vehicle tracking (two synchronised views; includes omnidirectional and moving camera). PETS'2001 consists of five separate sets of training and test sequences, i.e. each set consists of one training sequence and one test sequence. All the datasets are multi-view (2 cameras) and are significantly more challenging than for PETS'2000 in terms of significant lighting variation, occlusion, scene activity and use of multi-view data.

Metadata:

Tracking information on image plane and ground plane can be found at:

http://www.cvg.cs.rdg.ac.uk/PETS2001/ANNOTATION/

Contextual info:

Camera Calibration provided

Comments:

Copyrights:

Free download from website

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

PETS 2000

Website:

ftp://ftp.pets.rdg.ac.uk/pub/PETS2000/

Dataset:

Outdoor people and vehicle tracking (single camera).

Two sequences:

a) Training sequence of 3672 frames at 25 Hz (146.88 secs).

b) Test sequence of 1452 frames (58.08 secs).

The sequences are available in 2 formats:

a) QuickTime movie format with Motion JpegA compression (training.mov and test.mov).

b) Individual Jpeg files (training_images/*.jpg and test_9images/*.jpeg).

Metadata:

No Ground Truth provided.

Contextual info:

Camera Calibration provided.

Comments:

Copyrights:

Free download

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

PETS

Website:

Website: http://www.cvg.rdg.ac.uk/slides/pets.html

Dataset:

Each year PETS runs an evaluation framework on specific datasets with specific objective. 2000: 2001.... (more on duration and theme)

Metadata:

Ground truth depends on the theme of each year's workshop.

Contextual info:

Comments:

Copyrights:

Free download from website

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

I-LIDS - Surveillance

Website:

http://scienceandresearch.homeoffice.gov.uk/hosdb/cctv-imaging-technology/video-based-detection-systems/i-lids/

Dataset:

4 scenarios (Parked Vehicle, Abandoned Package, Doorway Surveillance and Sterile Zone) x 2 datasets (training, testing) each. Each dataset contains about 24 hours of footage in few different scenes.

Metadata:

Event-based Ground truth.

Contextual info:

Images of a pedestrian model in different positions are given for calibration purposes

Comments:

7 free clips for 2 scenarios (Parked Vehicle, Abandoned Package) are available from: http://www.elec.qmul.ac.uk/staffinfo/andrea/avss2007_d.html

Copyrights:

A user agreement and a payment (£500-£650 per dataset) is required to obtain each dataset. Datasets are provided in hard disks.

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

MEDICAL

DDSM: Digital Database for Screening Mammography

Website:

Datasets are available here:

http://marathon.csee.usf.edu/Mammography/Database.html

Dataset:

The Digital Database for Screening Mammography (DDSM) is a resource for use by the mammographic image analysis research community. The database contains approximately 2620 cases available in 43 volumes (healthy and diseased).

Metadata:

Images containing suspicious areas have associated pixel-level "ground truth" information about the locations and types of suspicious regions.

Contextual info:

Each study includes two images of each breast, along with some associated patient information (age at time of study, ACR breast density rating, subtlety rating for abnormalities, ACR keyword description of abnormalities) and image information (scanner, spatial resolution, ...). A case consists of between 6 and 10 files. These are an "ics" file, an overview "16-bit PGM" file, four image files that are compressed with lossless JPEG encoding and zero to four overlay files. Normal cases will not have any overlay files.

Comments:

Copyrights:

If you use data from DDSM in publications:

Please credit the DDSM project as the source of the data, and reference: ?The Digital Database for Screening Mammography, Michael Heath, Kevin Bowyer, Daniel Kopans, Richard Moore and W. Philip Kegelmeyer, in Proceedings of the Fifth International Workshop on Digital Mammography, M.J. Yaffe, ed., 212-218, Medical Physics Publishing, 2001. ISBN 1-930524-00-5?. ?Current status of the Digital Database for Screening Mammography, Michael Heath, Kevin Bowyer, Daniel Kopans, W. Philip Kegelmeyer, Richard Moore, Kyong Chang, and S. MunishKumaran, in Digital Mammography, 457-460, Kluwer Academic Publishers, 1998; Proceedings of the Fourth International Workshop on Digital Mammography?. Also, please send a copy of your publication to Professor Kevin Bowyer / Computer Science and Engineering / University of Notre Dame / Notre Dame, Indiana 46530.

Contact:

Cedric Marchessoux, cedric.marchessoux@barco.com

The Volume Library

Website:

Datasets are available here:

http://www9.informatik.uni-erlangen.de/External/vollib/

Dataset:

Name of the set, Anatomy, resolution, number of bits

Metadata:

Contextual info:

Environment conditions (calibration, scene...): scanning parameters

Comments:

Mainly CT, PET, MRI. Additional comments are available, all the dataset are not only medical content, you could find a scan of a bonzaï. The raw data can be extracted easily using the PVM tools distributed with the V^3 volume rendering package available at http://www.stereofx.org/

Copyrights:

Commercial use is prohibited and no warranty whatsoever is expressed, credit should be given to the group who created the dataset.

Contact:

Stefan Roettger (roettger@cs.fau.de) or Cedric Marchessoux (cedric.marchessoux@barco.com)

DICOM sample image sets

Website:

http://pubimage.hcuge.ch:8080

http://pubimage.hcuge.ch/

Dataset:

DICOM sample image sets with alias name, the modality, the file size with a short description.

Metadata:

Contextual info:

Environment conditions (calibration, scene...)

Comments:

Mainly CT and MRI, more than 10 GB of data.

Copyrights:

Click on the thumbnail images to download the full set of corresponding DICOM images

Contact:

Cedric Marchessoux (cedric.marchessoux@barco.com)

MyPACS.net, reference case manager

Website:

Datasets are available here:

http://www.MyPACS.net

Dataset:

MyPACS.net is still free, and it now has over 16,500 teaching files contributed by 14,000 registered users. With 75,000 key images categorized by anatomy and pathology, you can quickly find examples of any disease. The web-based viewer has been improved with more PACS-like features, and it still works instantly in your browser, requiring nothing to download.

The datasets contain:

1. Cranium and Contents (1205)
2. Face and Neck (398)
3. Spine and Peripheral Nervous System (504)
4. Skeletal System (3433)
5. Heart (160)
6. Chest (894)
7. Gastrointestinal (1271)
8. Genitourinary (800)
9. Vascular/Lymphatic (416)
10. Breast (62)
11. Other (458)

Metadata:

Description of the pathology by medical doctors.

Contextual info:

Environment conditions (calibration, scene...): Medical modality described: Brand and acquisition conditions

Comments:

Copyrights:

MyPACS.net is still free, you need to be registered.

Contact:

Cedric Marchessoux (cedric.marchessoux@barco.com)

The NCIA (National Cancer Imaging Archive from National Cancer Institute) data base

Website:

Datasets are available here:

https://imaging.nci.nih.gov/ncia/

Dataset:

Description of Dataset (Content, size, etc): CT scans with xml files for the ground truth, and also other modalities.

Metadata:

Groundtruth stored in xml

Contextual info:

Environment conditions (calibration, scene...): X-ray scanner system: Brand and acquisition conditions

Comments:

Copyrights:

The user should ask for a login. You may browse, download, and use the data for non-commercial, scientific and educational purposes. However, you may encounter documents or portions of documents contributed by private institutions or organizations. Other parties may retain all rights to publish or produce these documents. Commercial use of the documents on this site may be protected under United States and foreign copyright laws. In addition, some of the data may be the subject of patent applications or issued patents, and you may need to seek a license for its commercial use. NCI does not warrant or assume any legal liability or responsibility for the accuracy, completeness or usefulness of any information in this archive.

Contact:

Cedric Marchessoux (cedric.marchessoux@barco.com)

Conventional x-ray mammography data base

Website:

No official website, via Elizabeth Krupinski (krupinski@radiology.arizona.edu)

Dataset:

Real masses, micro calcifications, backgrounds, conventional x-ray mammography, bmp images with resolution of 256x256.

Metadata:

None, signals can be extracted by substraction between backrgrounds alone and background+signals at 100% density

Contextual info:

Environment conditions (calibration, scene...): X-ray system

Comments:

See examples:
1. Backgrounds,
2. Signals: masses
3. Signals: micro calcifications

Copyrights:

Via Elizabeth Krupinski (krupinski@radiology.arizona.edu) free but credit should be given to them if publication.

Contact:

Elizabeth Krupinski (krupinski@radiology.arizona.edu) or Cedric Marchessoux (cedric.marchessoux@barco.com)

JSRT - Standard Digital Image Database (X-RAY)

Website:

Datasets are available here:

http://www.jsrt.or.jp/web_data/english03.html

Dataset:

Around 5 datasets of 250 images, x-ray chest healthy and diseased with nodules. 2048x2048, white is zero, big endian.

Metadata:

Per image, clinical metadata in txt file for each image with patient information age, sexe and images in itf with nodule, cancer, infection position.

Contextual info:

Environment conditions (calibration, scene...): X-ray system

Comments:

THe dataset should be ordered by email with a Visa card number. The dataset is delivered by post after one week. The price per dataset is more than reasonable.

Copyrights:

For publication credit should be given by citing in references the following article:
o J. Shiraishi et al. Development of a Digital Image Database for Chest Radiographs with and without a Lung Nodule: Receiver Operating Characteristic Analysis of Radiologists, Detection of Pulmonary Nodules. AJR, 174(1):71-74, 2000.

Contact:

Cedric Marchessoux (cedric.marchessoux@barco.com)

CONSUMER APPLICATIONS

ICCV 2007 - Optical Flow Performance Evaluation

Website:

Dataset can be found here: http://vision.middlebury.edu/flow/data/

Dataset:

Datasets are here composed of sets of images to evaluate optical flow.

Sets can be made of 2 or 8 images for the evaluation in color or graylevel format.

Metadata:

GT is not provided for all datasets

Contextual info:

Flow accuracy and interpolation evaluation

We report two measures of flow accuracy (angular and end-point error) and two measures of interpolation quality. For each of the 4 measures we report 8 error metrics, resulting in a total of 32 tables. Links to the 4 measures are included below, but the tables are also linked among each other. At this point we do not identify a "default" measure or metric, and thus we do not provide an overall ranking of methods.

Comments:

The ground-truth flow is provided in a .flo format. Information and C++ code is provided in flow-code.zip, which contains the file README.txt. A Matlab version is also available in flow-code-matlab.zip.

Copyrights:

thanks to Brad Hiebert-Treuer and Alan Lim, who spent countless hours creating the hidden texture datasets

Contact:

Basket-ball - APIDIS

Website:

Sequences are available here: http://www.apidis.org/Public/

This page gives access to the first acquisition campaign of basket ball data during the APIDIS European project.

Dataset:

The dataset is composed of a basket ball game.

Seven 2-Mpixels color cameras around and on top of a basket ball court

Note: Due to bandwidth limitations, only a part of the basket ball game is availbale from this web site. Please contact us (bottom of this page) for more data.

Metadata:

Time stamp for each frame (all cameras being captured by a unique server at ~22 fps)
Manually annotated basket ball events
Manually annotated objects positions
Calibration data

Metadata XML files
- Annotated events and salient-objects are recorded into two kinds of XML files.
  Users could find the syntax of tags of both kinds of metadata in the two following XML Schema Definition (xsd) files: apidis-annotation-ver23.xsd and apidis-salientobj-ver1.xsd.
  A simplified structural diagram of event xml files is: http://www.apidis.org/Public/all/metadata/event-xml-simple.png.
  You can also find a full view of all tags defined in apidis-annotation-ver23.xsd and their structures here.
  
  The following diagram shows the tags for describing the detected objects and their properties:http://www.apidis.org/Public/all/metadata/salient-obj-xml.png

Contextual info:

All cameras are Arecont Vision AV2100M IP cameras. The datasheets can be downloaded from the constructor site here and here.
Lenses: The fish-eye lenses used for the top view cameras are Fujinon FE185C086HA-1 lenses.

Comments:

Copyrights:

This dataset is available for non-commercial research in video signal processing only. We kindly ask you to mention the APIDIS project when using this dataset (in publications, video demonstrations...).

Contact:

christophe.devleeschouwer(at)uclouvain.be or Damien.Delannay(at)uclouvain.be

Freesound

Website:

Datasets are available here:

http://freesound.iua.upf.edu/

Dataset:

The Freesound Project is a collaborative database of Creative Commons licensed sounds. Freesound focusses only on sound, not songs.

Metadata:

Contextual info:

Comments:

Copyrights:

Creative Commons

Contact:

The International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL) Project

Website:

Datasets are available here:

http://www.music-ir.org/evaluation/

Dataset:

The objective of the International Music Information Retrieval Systems Evaluation Laboratory project (IMIRSEL) is the establishment of the necessary resources for the scientifically valid development and evaluation of emerging Music Information Retrieval (MIR) and Music Digital Library (MDL) techniques and technologies.

Metadata:

Contextual info:

Comments:

Copyrights:

Available on request

Contact:

Public domain

Website:

Datasets are available here:

http://www.publicdomaintorrents.com/ Lien bittorrent

Dataset:

10 movies (from 1930-1950, some more recent), most are in color

Metadata:

The databases can be shared and are available on the internet. No annotation or ground-truth is currently available. It will be added when available.

Contextual info:

Comments:

Copyrights:

all fall now in the public domain

Contact:

Sabri Boughorbel

Phillips Internal dataset

Website:

none

Dataset:

Metadata:

we can provide the metadata such as shot, scene cuts, face, eye position, identity etc.

Contextual info:

Comments:

Copyrights:

Contact:

Sabri Boughorbel

RWC Music Database

Website:

Datasets are available here:

http://staff.aist.go.jp/m.goto/RWC-MDB/

Dataset:

The RWC (Real World Computing) Music Database is a copyright-cleared music database (DB) that is available to researchers as a common foundation for research.

Metadata:

MIDI files, genre, lyrics

Contextual info:

Comments:

Copyrights:

Users who have submitted the Pledge and received authorization may freely use the database for research purposes without facing the usual copyright restrictions, but all of the copyrights and neighboring rights connected with this database belong to the National Institute of Advanced Industrial Science and Technology and are managed by the RWC Music Database Administrator. Persons or organizations that have not submitted a Pledge and that have not received authorization may not use the database.

Contact:

CVBASE - 2006

Website:

Datasets are available here:

http://vision.fe.uni-lj.si/cvbase06/downloads.html

Dataset:

Video data (.avi, DivX compressed). Dataset includes three types of sports: European (team) handball (3 synchronized videos, 10 min, 25 FPS, 384x288, Divx 5 AVI), Squash (2 videos from 2 separate matches, 25 FPS, 384x288, DivX AVI) , Basketball (videos only, 2 synchronized overhead videos in 2 quality modes 368x288, 25FPS, 5 minutes each and 720x576, 25 FPS 2 minutes each).

Metadata:

Annotations (individual player actions, group activity). Suitable for use as a gold standard. Trajectories (player positions in court and camera coordinate systems). These are not intended to be used as a gold standard, since their accuracy is not particularly high.

Contextual info:

Comments:

Copyrights:

nothing defined from website

Contact:

Xavier Desurmont, desurmont@multitel.be

VSPETS - 2003 - INMOVE

Website:

Datasets are available here:

ftp://ftp.cs.rdg.ac.uk/pub/VS-PETS/

Dataset:

Outdoor people tracking - football data (three synchronised views). The datasets consists of football players moving around a pitch.

Metadata:

Tracking information on image plane for camera 3 can be downloaded. An AVI file of the ground truth for camera view 3 is also available.

Contextual info:

Comments:

Copyrights:

Free download from website

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

Trictrac

Website:

http://www.multitel.be/trictrac/?mod=3

Dataset:

HD progressive image in jpeg for synthetic video sequence of soccer.

Metadata:

XML (position is 2D, 3D of objects and camera)

Contextual info:

Comments:

The dataset is fully described in "TRICTRAC Video Dataset: Public HDTV Synthetic Soccer Video Sequences With Ground Truth", X. Desurmont, J-B. Hayet, J-F. Delaigle, J. Piater, B. Macq, Workshop on Computer Vision Based Analysis in Sport Environments (CVBASE), 2006.

Copyrights:

All data is publicly available and downloadable. If you publish results using the data, please acknowledge the data as coming from the TRICTRAC project, found at URL: http://www.multitel.be/trictrac. THE DATASET IS PROVIDED WITHOUT WARRANTY OF ANY KIND.

Contact:

Xavier Desurmont, desurmont@multitel.be

OTHERS

PETS - 2009

Website:

The datasets are available here:

http://www.cvg.rdg.ac.uk/PETS2009/

Dataset:

Pets 2009 : Eleventh IEEE International Workshop on Performance Evaluation of Tracking and Surveillance

One-day workshop organised in association with CVPR 2009, supported by the EU project SUBITO.

The datasets for PETS 2009 consider crowd image analysis and include crowd count and density estimation, tracking of individual(s) within a crowd, and detection of separate flows and specific crowd events. Click on the link to the left to view the benchmark data.

The dataset is organised as follows:

Calibration Data
S0: Training Data
- contains sets background, city center, regular flow
S1: Person Count and Density Estimation
- contains sets L1,L2,L3
S2: People Tracking
- contains sets L1,L2,L3
S3: Flow Analysis and Event Recognition
- contains sets Event Recognition and Multiple Flow

Metadata:

Contextual info:

Comments:

Copyrights:

Please e-mail datasets@pets2009.net if you require assistance obtaining these datasets for the workshop.

Contact:

datasets@pets2009.net

IPPR : contest motion segmentation dataset

Website:

Datasets are available here:

http://media.ee.ntu.edu.tw/Archer_contest/

Dataset:

3 different context of walking persons.

Metadata:

Segmentation of person is provided.

Contextual info:

Comments:

Copyrights:

Contact:

GavabDB : 3D face database

Website:

Datasets are available here:

http://gavab.escet.urjc.es/recursos_en.html

Dataset:

GavabDB is a 3D face database. It contains 549 three-dimensional images of facial surfaces. These meshes correspond to 61 different individuals (45 male and 16 female) having 9 images for each person. The total of the individuals are Caucasian and their age is between 18 and 40 years old. Each image is given by a mesh of connected 3D points of the facial surface without texture. The database provides systematic variations with respect to the pose and the facial expression. In particular, the 9 images corresponding to each individual are: 2 frontal views with neutral expression, 2 x-rotated views (±30º, looking up and looking down respectively) with neutral expression, 2 y-rotated views (±90º, left and right profiles respectively) with neutral expression and 3 frontal gesture images (laugh, smile and a random gesture chosen by the user, respectively).

Metadata:

Contextual info:

Comments:

Copyrights:

Those publications that use this signature date must reference the following work: A.B. Moreno y A.Sanchez. GavabDB: A 3D Face Database. Proc. 2nd COST Workshop on Biometrics on the Internet: Fundamentals, Advances and Applications, C. Garcia et al (eds): Proc. 2nd COST Workshop on Biometrics on the Internet: Fundamentals, Advances and Applications, Ed. Univ. Vigo, pp. 77-82, 2004

Contact:

3D_RMA : 3D database

Website:

Datasets are available here:

http://www.sic.rma.ac.be/~beumier/DB/3d_rma.html

Dataset:

120 persons were asked to pose twice in front of the system: in Nov 97 (session1) and in January 98 (session2). For each session, 3 shots were recorded with different (but limited) orientations of the head: straight forward / Left or Right / Upward or downard.

Among the 120 people, two thirds consist of students from the same ethnic origins and with nearly the same age. The last third consists of people of the academy, all aged between 20 and 60.

Different problems encountered in the cooperative scenario were taken into account. People sometimes worn their spectacles, sometimes didn't. Beards and moustaches were represented. Some people smiled in some shots. Small up/down and left/right rotations of the head were requested. We regret that only a few (14) women were available.

Metadata:

Contextual info:

Comments:

Copyrights:

Contact:

beumier@elec.rma.ac.be

Actions as Space-Time Shapes

Website:

Datasets are available here:

http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html

Dataset:

Human action in video sequences can be seen as silhouettes of a moving torso and protruding limbs undergoing articulated motion. We regard human actions as three-dimensional shapes induced by the silhouettes in the space-time volume. We adopt a recent approach by Gorelick et. al. for analyzing 2D shapes and generalize it to deal with volumetric space-time action shapes. Our method utilizes properties of the solution to the Poisson equation to extract space-time features such as local space-time saliency, action dynamics, shape structure and orientation. We show that these features are useful for action recognition, detection and clustering. The method is fast, does not require video alignment and is applicable in (but not limited to) many scenarios where the background is known. Moreover, we demonstrate the robustness of our method to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video.

Metadata:

Contextual info:

Comments:

Copyrights:

Contact:

lena.gorelick@weizmann.ac.il

KTH - Recognition of human actions

Website:

Datasets are available here:

http://www.nada.kth.se/cvap/actions/

Dataset:

The current video database containing six types of human actions (walking, jogging, running, boxing, hand waving and hand clapping) performed several times by 25 subjects in four different scenarios: outdoors s1, outdoors with scale variation s2, outdoors with different clothes s3 and indoors s4 as illustrated below. Currently the database contains 2391 sequences. All sequences were taken over homogeneous backgrounds with a static camera with 25fps frame rate. The sequences were downsampled to the spatial resolution of160x120 pixels and have a length of four seconds in average.

Metadata:

Contextual info:

Comments:

Copyrights:

Contact:

laptev(at)nada.kth.se

PLIA2

Website:

Datasets are available here:

http://architecture.mit.edu/house_n/data/PlaceLab/PLIA2.htm

Dataset:

The researcher was asked to perform a set of common household activities during the four-hour period using a set of instructions. Activities included the following: preparing a recipe, doing a load of dishes, cleaning the kitchen, doing laundry, making the bed, and light cleaning around the apartment. The volunteer determined the sequence, pace, and concurrency of these activities and also integrated additional household tasks. Our intent was to have a short test dataset of a manageable size that could be easily placed on the web without concerns about anonymity. We wanted this test dataset, however, to show a variety of activity types and activate as many sensors as possible, but in a natural way. In addition to the activities above, the researcher searches for items, uses appliances, talks on the phone, answers email, and performs other everyday tasks. The researcher five mobile accelerometers (one on each limb and one on the hip) and a Polar M32 wireless heart rate monitor. The researcher carried an SMT 5600 mobile phone that ran experience sampling software that beeped and presented a set of questions about her activities.

Metadata:

The dataset includes four hours of partially (and soon to be fully) annotated video. The annotation was done using custom annotation software written by Randy Rockinson and Leevar Williams of MIT House_n. This software (called HandLense) is available for researchers to use to study this dataset. [Overview of HandLense and executable]

The annotations include descriptors for body posture, type of activity, location, and social context.

Contextual info:

Comments:

Copyrights:

Contact:

MuHAVi: Multicamera Human Action Video Data

Website:

Datasets are available here:

http://dipersec.king.ac.uk/MuHAVi-MAS/

Dataset:

Here is collected a large body of human action video (MuHAVi) data using 8 cameras. There are 17 action classes performed by 14 actors. So far we have processed videos corresponding to 7 actors in order to split the actions and provide the JPG image frames. However, we have included some image frames before and after the actual action, for the purpose of background subtraction, tracking, etc. The longest pre-action frames correspond to the actor called Person1. Each actor performs each action several times in the action zone highlighted using white tapes on the scene floor. As actors were amateurs the leader had to interrupt the actors in some cases and ask them to redo the action for consistency. We have used 8 CCTV Schwan cameras located at 4 sides and 4 corners of a rectangular platform. Note that these cameras are not necessarily synchronised. We are working on improving the synchronisation between the images corresponding to different cameras.

Metadata:

Calibration information may be included here in the future. Meanwhile, one can use the patterns on the scene floor to calibrate the cameras of interest.

Contextual info:

Comments:

Copyrights:

Contact:

Sergio.Velastin@kingston.ac.uk

ViHASi: Virtual Human Action Silhouette Data

Website:

Datasets are available here:

http://dipersec.king.ac.uk/VIHASI/

Dataset:

This dataset provides a large body of synthetic video data generated for the purpose of evaluating different algorithms on human action recognition which are based on silhouettes. The data consist of 20 action classes, 9 actors and up to 40 synchronised perspective camera views. It is well known that for the action recognition algorithms which are purely based on human body masks, where other image properties such as colour and intensity are not used, it is important to obtain accurate silhouette data from video frames. This problem is not usually considered as part of the action recognition, but as a lower level problem in the motion tracking and change detection. Hence for researchers working on the recognition side, access to reliable Virtual Human Action Silhouette (ViHASi)data semmes to be both a necessity and a relief. The reason for this is that such data provide a wat of comprehensive experimentation and evaluation of the methods under study, that might even lead to thier improvments.

Metadata:

Contextual info:

Comments:

Copyrights:

Contact:

Sergio.Velastin@kingston.ac.uk

Daimler - Pedestrian Dataset

Website:

Datasets are available here:

http://www.gavrila.net/Computer_Vision/Research/Pedestrian_Detection/DC_Pedestrian_Class__Benchmark/dc_pedestrian_class__benchmark.html

Dataset:

The dataset contains a collection of pedestrian and non-pedestrian images. It is made available for download on this site for benchmarking purposes, in order to advance research on pedestrian classification.

The dataset consists of two parts:

a base data set. The base data set contains a total of 4000 pedestrian- and 5000 non-pedestrian samples cut out from video images and scaled to common size of 18x36 pixels. This data set has been used in Section VII-A of the paper referenced above.

Pedestrian images were obtained from manually labeling and extracting the rectangular positions of pedestrians in video images. Video images were recorded at various (day) times and locations with no particular constraints on pedestrian pose or clothing, except that pedestrians are standing in upright position and are fully visible. As non-pedestrian images, patterns representative for typical preprocessing steps within a pedestrian classification application, from video images known not to contain any pedestrians. We chose to use a shape-based pedestrian detector that matches a given set of pedestrian shape templates to distance transformed edge images (i.e. comparatively relaxed matching threshold).
additional non-pedestrian images. An additional collection of 1200 video images NOT containing any pedestrians, intended for the extraction of additional negative training examples. Section V of the paper referenced above describes two methods on how to increase the training sample size from these images, and Section VII-B lists experimental results.

Metadata:

Contextual info:

Comments:

Copyrights:

This dataset is made available to the scientific community for non-commercial research purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use, copy, and distribute the data given.

Contact:

gavrila(at)science.uva.nl

TERRASCOPE

Website:

Datasets are available here:

http://www.metaverselab.org/datasets/terrascope/

Dataset:

The dataset consists of nine different cameras, deployed over several different rooms and a hallway in a ``laboratory/office" setting. Several different scenarios were collected from the cameras. A two minute sequence was captured of researchers/staff/visitors going about their daily activities. In addition three different scenarios were scripted so that particular behaviors were exibited in the data.

During data collection, all cameras wrote raw (uncompressed) data at a resolution of 640x480. All machine clocks were synchonrized via the NTP. In addition to each frame, a timestamp was recorded so that frames can be associated with one another across cameras.

Selected Ground Truth (102 MB) - frames with hand-marked labels of individuals and objects

Scenario 1 (11.8 GB) - “Group Meeting”

Scenario 2 (11.2 GB) - “Group Exit and Intruder”

Scenario 3 (17.4 GB) - “Suspicious Behavior/Theft”

Unscripted Activities (59.6 GB) - natural behavior and activities

Subject Face/Gait Database (101 MB) - face pictures and video of subjects walking in front of the camera

Metadata:

Extensive groundtruth is also provided. Entrance and exit times for individuals in each camera, foreground segmentation, and activity labeling is all part of the dataset.

Contextual info:

Comments:

Copyrights:

Public datasets

Contact:

OTCBVS Benchmark Dataset Collection

Website:

Datasets are available here:

http://www.cse.ohio-state.edu/otcbvs-bench/

Dataset:

This is a publicly available benchmark dataset for testing and evaluating novel and state-of-the-art computer vision algorithms. Several researchers and students have requested a benchmark of non-visible (e.g., infrared) images and videos. The benchmark contains videos and images recorded in and beyond the visible spectrum and is available for free to all researchers in the international computer vision communities. Also it will allow a large spectrum of IEEE and SPIE vision conference and workshop participants to explore the benefits of the non-visible spectrum in real-world applications, contribute to the OTCBVS workshop series, and boost this research field significantly.

There are 7 datasets:

1) Dataset 01: OSU Thermal Pedestrian Database

2) Dataset 02: IRIS Thermal/Visible Face Database

3) Dataset 03: OSU Color-Thermal Database

4) Dataset 04: Terravic Facial IR Database

5) Dataset 05: Terravic Motion IR Database

6) Dataset 06: Terravic Weapon IR Database

7) Dataset 07: CBSR NIR Face Dataset

Metadata:

Contextual info:

Comments:

Copyrights:

Contact:

otcbvs-bench@cse.ohio-state.edu.

Eyes and faces dataset

Website:

Datasets are available here:

http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html

http://www.multitel.be/~va/cantata/EyesAndFaces/index.html

Dataset:

Hereby the eyes ground truth in Viper format of face YaleB database containing 5760 single light source images of 10 subjects each seen under 576 viewing conditions (9 poses x 64 illumination conditions) + 650 viper files. Ground truth developed in the context of CANTATA project, developed by BARCO

Metadata:

All the images are annotated with Viper XML files. Each “.bmp” image is associated with a “.xml” annotation file with the same name, containing the iris positions. The position corresponds to crosses. The path of the bmp image should be changed in the viper file.

Contextual info:

For every subject in a particular pose, an image with ambient (background) illumination was also captured. Hence, the total number of images is in fact 5760+90=5850. The total size of the compressed database is about 1GB.

Comments:

The dataset already exists without the ground truth in Viper format. The ground truth was either generated or converted in Viper format in the context of Cantata project. The metadata were generated by Arnaud Joubel.

Copyrights:

Dataset YaleB: You are free to use the Yale Face Database B for research purposes. If experimental results are obtained that use images from within the database, all publications of these results should acknowledge the use of the "Yale Face Database B" and reference to “Georghiades, A.S. and Belhumeur, P.N. and Kriegman, D.J. From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose", IEEE Trans. Pattern Anal. Mach. Intelligence, 2001, 23, number, 643-660”.

Ground truth in Viper: Requested citation acknowledgment about the ground truth:
Courtesy of ITEA2 funded Cantata project

Contact:

Quentin Besnehard, quentin.besnehard@barco.com or Cedric Marchessoux, cedric.marchessoux@barco.com

Anti Aliased Text Dataset

Website:

Datasets are available here:

http://www.multitel.be/~va/cantata/AntiAliased/index.html

Dataset:

Set of bitmap images containing anti-aliased text in the context of CANTATA project, developed by BARCO. Number of images in the archive (2400 available in the archive)

Metadata:

All the images are annotated with Viper XML files. Each “.bmp” image is associated with a “.grid.xml” annotation file with the same name. The annotation takes the form of a grid of 32x32 pixels bounding boxes. The path of the bmp image should be changed in the viper file if you want to open it in viper-gt.

Contextual info:

The text is represented in different colors: black on white, white on black, random dark color on white, white on random dark color, black on random light color, random light color on white, random dark color on random light color and, finally, random light color on random dark color.The annotation takes the form of a grid of 32x32 pixels bounding boxes.

Comments:

The dataset and the ground truth were generated by Quentin Besnehard and Arnaud Joubel. To obtain the complete dataset, send an e-mail to the contact person

Copyrights:

The fonts used are available under the GNU General Public License version 2.0. These fonts are free clones of the original fonts provided by URW typeface foundry.

Requested citation acknowledgment about the dataset and the ground truth : Courtesy of ITEA2 funded Cantata project.

Contact:

Quentin Besnehard, quentin.besnehard@barco.com or Cedric Marchessoux, cedric.marchessoux@barco.com

Aliased Text Dataset

Website:

Datasets are available here:

http://www.multitel.be/~va/cantata/Aliased

Dataset:

Set of bitmap images containing aliased text (2 colors) in the context of CANTATA project, developed by BARCO. Number of images in the archive (1250 available in the archive)

Metadata:

Contextual info:

Helvetica
Optima
AvantGarde
Times
Palatino
Courier
Century

Comments:

The dataset and the ground truth were generated by Quentin Besnehard and Cédric Marchessoux.

Copyrights:

The fonts used are available under the GNU General Public License version 2.0. These fonts are free clones of the original fonts provided by URW typeface foundry. Requested citation acknowledgment about the data set and the ground truth: Courtesy of ITEA2 funded Cantata project

Contact:

Quentin Besnehard, quentin.besnehard@barco.com; C?dric Marchessoux, cedric.marchessoux@barco.com

PETS - ICVS - 2003 - FGnet

Website:

Datasets are available here:

http://www.cvg.cs.rdg.ac.uk/PETS-ICVS/pets-icvs-db.html

Dataset:

Smart meeting, that includes facial expressions, gaze and gesture/action. The environment consists of three cameras: one mounted on each of two opposing walls, and an omnidirectional camera positioned at the centre of the room. The dataset consists of four scenarios.

Metadata:

a) Eye positions of people in Scenarios A, B and D. (every 10th frame is annotated).

b) Facial expression and gaze estimation for Scenarios A and D, Cameras 1-2.

c) Gesture/action annotations for Scenarios B and D, Cameras 1-2.

Contextual info:

Camera Calibration provided.

Comments:

Copyrights:

Free download

Contact:

Dimitrios Makris, d.makris@kingston.ac.uk

RESSOURCES AND LINKS

Medical datasets

Datasets are available here:

http://gdcm.sourceforge.net/wiki/index.php/Sample_DataSet#DataSet

This website contains a multiple links to medical datasets.

TRECVID

The TRECVID conference series is sponsored by the National Institute of Standards and Technology (NIST) with additional support from other U.S. government agencies. The goal of the conference series is to encourage research in information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. In 2001 and 2002 the TREC series sponsored a video "track" devoted to research in automatic segmentation, indexing, and content-based retrieval of digital video. Beginning in 2003, this track became an independent evaluation (TRECVID) with a 2-day workshop taking place just before TREC.

Datasets are described here.

Image Datasets

Datasets are available here:

http://www.cs.bu.edu/groups/ivc/data.php

It contains various datasets like:

Image database used in shape-based retrieval experiments

Images databases used in deformable shape-based segmentation and retreival experiments

Over 70 video sequences and ground truth used in evaluation of 3D head tracking

Labeled video sequences used as ground truth in skin color segmentation experiments

Hand image database with ground truth

Dynamic background sequences

Half-Life 2 mods

www.hl2mods.co.uk

More mods for the game engine.

Scenario game

A mod created by students in Toronto. It is a complete game, but maps can be used with the OVVV.

www.torontoconflict.com

The USC-SIPI Image Database

The USC-SIPI image database is a collection of digitized images. It is maintained primarily to support research in image processing, image analysis, and machine vision. The first edition of the USC-SIPI image database was distributed in 1977 and many new images have been added since then.

The database is divided into volumes based on the basic character of the pictures. Images in each volume are of various sizes such as 256x256 pixels, 512x512 pixels, or 1024x1024 pixels. All images are 8 bits/pixel for black and white images, 24 bits/pixel for color images. The following volumes are currently available:

	Textures	Brodatz textures, texture mosaics, etc.
	Aerials	High altitude aerial images
	Miscellaneous	Lena, the mandrill, and other favorites
	Sequences	Moving head, fly-overs, moving vehicle