coco数据集的学习和理解

1. 官方网址：http://cocodataset.org2.下载的资料这里只看2017版的：http://cocodataset.org/#downloadimages里面四个文件夹，下载下来的图像长这样：3. coco APIhttps://github.com/cocodataset/cocoapi这个API很傻的一个地方是，必须在python...

关关教你学编程

9840人浏览 · 2019-05-07 15:47:49

关关教你学编程 · 2019-05-07 15:47:49 发布

1. 官方网址：

http://cocodataset.org

2. 下载的资料

这里只看2017版的：http://cocodataset.org/#download

images里面四个文件夹，下载下来的图像长这样：

3. coco API

https://github.com/cocodataset/cocoapi

这个API很傻的一个地方是，必须在python2环境中运行（python3会各种报错），为此特地搞了一个虚拟环境（请参考我的博客：虚拟环境的配置），而且该API是用jupyter可视化的，所以还要搞一个jupyter（请参考我的博客：jupyter的安装、配置、使用）

4. Images

COCO has 91 thing classes (1-91), 91 stuff classes (92-182) and 1 class "unlabeled" (183).

thing和stuff的区别：

2018CVPR论文《COCO-Stuff: Thing and Stuff Classes in Context》里是这么写的：

Defining things and stuff. The literature provides definitions for several aspects of stuff and things, including:(1) Shape: Things have characteristic shapes (car, cat,phone), whereas stuff is amorphous (sky, grass, water)[21, 59, 28, 51, 55, 39, 17, 14]. (2) Size: Things occur at characteristic sizes with little variance, whereas stuff regions are highly variable in size [21, 2, 27]. (3) Parts: Thing classes have identifiable parts [56, 19], whereas stuff classes do not (e.g. a piece of grass is still grass, but a wheel is not a car). (4) Instances: Stuff classes are typically not countable [2] and have no clearly defined instances [14, 25, 53]. (5) Texture: Stuff classes are typically highly textured [21, 27, 51, 14]. Finally, a few classes can be interpreted as both stuff and things, depending on the image conditions (e.g. a large number of people is sometimes considered a crowd).

COCO-Stuff labels的组成：

contains 172 classes: 80 thing, 91 stuff, and 1 class unlabeled.The 80 thing classes are the same as in COCO [35]. The 91 stuff classes are curated by an expert annotator. The class unlabeled is used in two situations: if a label does not belong to any of the 171 predefined classes, or if the annotator cannot infer the label of a pixel.

但是为什么网上很多资料说有80个thing（172 classes），有很多资料说有91个thing（183classes）呢？

而我在网上只找到了183classes这个版本的label表。于是我试图去找172classes这个版本的label表，无果。

于是我屁颠屁颠地去扒拉json文件里的categories字段，看看到底那11类差在哪里？！后来发现其实那11类Removed from COCO，如下Labels in COCO-Stuff：

5. Annotations:

对于这个数据集来说，可以应用于图像检测、语义分割、实例分割、全景分割（请参考我的博客：语义分割、实例分割、全景分割的关系和区别）、人的关键点检测、看图说话这些场景。每一个场景的图都是那些图：训练图像、验证图像、测试图像、没有打标签的图像。具体不同的场景应用，体现在标注文件上，不同的标注形式，决定了不同的应用场景。

COCO数据集的标注格式这篇文章已经说的很好了，但是：

（1）只是解释了部分的标注格式，内容并不全。

（2）另外具体的两种polygon格式和 RLE格式，并没有给出可视化，一开始我傻乎乎地去人工解析json格式，后来发现原来可以调用coco官网的API。

（3）并没有把标注文件和具体的应用场景对应上去，导致了很多困惑。这里我把它们用COCO官网的API和标注文件可视化之后的图像对应上去，困惑就迎刃而解了。

下面一个个标注文件来可视化，并说明一下对应的应用场景：

5.1 annotations_trainval2017 里面有六个标注文件：

captions_train2017.json instances_train2017.json person_keypoints_train2017.json
captions_val2017.json instances_val2017.json person_keypoints_val2017.json

5.1.1 instances_train2017.json，instances_val2017.json 用于实例分割，可视化如下：

这里面就是那80类的thing：

# display COCO categories and supercategories
cats = coco.loadCats(coco.getCatIds())
nms=[cat['name'] for cat in cats]
print("sum categories:", len(nms))
print('COCO categories: \n{}\n'.format(' '.join(nms)))

nms = set([cat['supercategory'] for cat in cats])
print('COCO supercategories: \n{}'.format(' '.join(nms)))

输出结果：

('sum categories:', 80)
COCO categories: 
person bicycle car motorcycle airplane bus train truck boat traffic light fire hydrant stop sign parking meter bench bird cat dog horse sheep cow elephant bear zebra giraffe backpack umbrella handbag tie suitcase frisbee skis snowboard sports ball kite baseball bat baseball glove skateboard surfboard tennis racket bottle wine glass cup fork knife spoon bowl banana apple sandwich orange broccoli carrot hot dog pizza donut cake chair couch potted plant bed dining table toilet tv laptop mouse remote keyboard cell phone microwave oven toaster sink refrigerator book clock vase scissors teddy bear hair drier toothbrush

COCO supercategories: 
outdoor food indoor appliance sports person animal vehicle furniture accessory electronic kitchen

5.1.2 person_keypoints_train2017.json person_keypoints_val2017.json 用于人体关键点检测，因为只给人做关键点检测，所以只有一类，可视化如下：

5.1.3 captions_train2017.json captions_val2017.json 用于看图说话，同样80类，可视化如下：

Two men being drug on buggies by dogs.
Men on bikes are getting pulled by a group of dogs.
Men race on wheeled vehicles towed by a group of husky dogs.
The man is riding a bike led by several dogs.
Men race bicycles on grass pulled by sleigh dogs.

5.2 stuff_annotations_trainval2017里面有两个标注文件：

stuff_val2017.json stuff_train2017.json

这里面就是那92类的stuff：

# display COCO categories and supercategories
cats = coco.loadCats(coco.getCatIds())
nms=[cat['name'] for cat in cats]
print("sum categories:", len(nms))
print('COCO categories: \n{}\n'.format(' '.join(nms)))

nms = set([cat['supercategory'] for cat in cats])
print('COCO supercategories: \n{}'.format(' '.join(nms)))

输出结果：

('sum categories:', 92)
COCO categories: 
banner blanket branch bridge building-other bush cabinet cage cardboard carpet ceiling-other ceiling-tile cloth clothes clouds counter cupboard curtain desk-stuff dirt door-stuff fence floor-marble floor-other floor-stone floor-tile floor-wood flower fog food-other fruit furniture-other grass gravel ground-other hill house leaves light mat metal mirror-stuff moss mountain mud napkin net paper pavement pillow plant-other plastic platform playingfield railing railroad river road rock roof rug salad sand sea shelf sky-other skyscraper snow solid-other stairs stone straw structural-other table tent textile-other towel tree vegetable wall-brick wall-concrete wall-other wall-panel wall-stone wall-tile wall-wood water-other waterdrops window-blind window-other wood other

COCO supercategories: 
building water plant floor raw-material sky ceiling textile solid window food-stuff furniture-stuff ground other wall structural

用于语义分割任务（不同的一类物体没区分出来），可视化如下：