COCO数据集介绍

文章目录1、COCO数据集的介绍2、COCO数据集结构及组成介绍本文主要是为了熟悉COCO数据集。1、COCO数据集的介绍首先上两个链接，第一个，第二个有以上两个链接足够了解COCO2、COCO数据集结构及组成介绍本部分主要是参考https://zhuanlan.zhihu.com/p/70878433，这个链接进行同步整理的，直接看原文也可以，只是觉的原文有点乱，不便于整体掌据该数据集。...

牛andmore牛

26567人浏览 · 2021-07-12 13:51:48

牛andmore牛 · 2021-07-12 13:51:48 发布

文章目录

本文主要是为了熟悉COCO数据集。

1、COCO数据集的介绍

首先上两个链接，第一个，第二个
有以上两个链接足够了解COCO
整个数据集的分布如下

#step1: 下载数据集
2017 Train images [118K/18GB]
2017 Val images [5K/1GB]
2017 Test images [41K/6GB]
2017 Train/Val annotations [241MB]

#step2: 按照下面结构存放文件夹
coco
  ├── annotations
  │   ├── instances_train2014.json
  │   ├── instances_train2017.json
  │   ├── instances_val2014.json
  │   ├── instances_val2017.json
  │   |   ...
  ├── train2017
  │   ├── 000000000009.jpg
  │   ├── 000000580008.jpg
  │   |   ...
  ├── val2017
  │   ├── 000000000139.jpg
  │   ├── 000000000285.jpg
  │   |   ...
  |   ...

2、COCO数据集标注格式

本部分主要是参考https://zhuanlan.zhihu.com/p/70878433，这个链接进行同步整理的，直接看原文也可以，只是觉的原文有点乱，不便于整体掌据该数据集。
COCO数据集大量使用Amazon Mechanical Turk来收集数据。COCO数据集现主要有三种标注类型：

object instance 目标实例
object keypoints 目标关键点
image captions 看图说话。
标注文件使用JSON文件进行存储。如下为COCO2017数据集中train,val的标注文件：

原文件是annotations_trainval2017.zip,解压后是annotations文件夹。可以看到一共有三种类型，每种类型包含训练和验证，共有6个JSON文件。

２.1实例分割Object Instance文件格式

以instance_val2017.json为例（验证集文件软小，打开较快），总体格式如下：

{
    "info": info,
    "licenses": [license],
    "images":[image],
    "annotations":[annotation],
    "categories":[category]
}

在这里插入图片描述

images字段下是一个列表，列表长度等同于划入训练集（或验证集）的图片数量
annotatons字段下也是一个列表，列表长度等同地训练集（或验证集）中bounding box 的数量
categories字段下也是一个列表，列表长度等同于数据集类别的数，coco2017分类数是80,用VScode打开看：

可以看到整个JSON文件是一个大的数字典。
通过jupyterlab打开看：

在这里插入图片描述

2.1.1 info中的内容

"info": {
        "description": "COCO 2017 Dataset",
        "url": "http://cocodataset.org",
        "version": "1.0",
        "year": 2017,
        "contributor": "COCO Consortium",
        "date_created": "2017/09/01"
    },

在这里插入图片描述
info中包括一些基本信息，时间，版本，贡献者等，没什么太大价值，可以忽略。

2.1.2 licenses中的内容

内容较少，这里全部列出：

"licenses": [
        {
            "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
            "id": 1,
            "name": "Attribution-NonCommercial-ShareAlike License"
        },
        {
            "url": "http://creativecommons.org/licenses/by-nc/2.0/",
            "id": 2,
            "name": "Attribution-NonCommercial License"
        },
        {
            "url": "http://creativecommons.org/licenses/by-nc-nd/2.0/",
            "id": 3,
            "name": "Attribution-NonCommercial-NoDerivs License"
        },
        {
            "url": "http://creativecommons.org/licenses/by/2.0/",
            "id": 4,
            "name": "Attribution License"
        },
        {
            "url": "http://creativecommons.org/licenses/by-sa/2.0/",
            "id": 5,
            "name": "Attribution-ShareAlike License"
        },
        {
            "url": "http://creativecommons.org/licenses/by-nd/2.0/",
            "id": 6,
            "name": "Attribution-NoDerivs License"
        },
        {
            "url": "http://flickr.com/commons/usage/",
            "id": 7,
            "name": "No known copyright restrictions"
        },
        {
            "url": "http://www.usa.gov/copyright.shtml",
            "id": 8,
            "name": "United States Government Work"
        }
    ],

在这里插入图片描述
一共有8条，也没什么价值，可以忽略。

2.1.3 images中的内容

内容较多，列几条：

"images": [
        {
            "license": 4,
            "file_name": "000000397133.jpg",
            "coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg",
            "height": 427,
            "width": 640,
            "date_captured": "2013-11-14 17:02:52",
            "flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg",
            "id": 397133
        },
        {
            "license": 1,
            "file_name": "000000037777.jpg",
            "coco_url": "http://images.cocodataset.org/val2017/000000037777.jpg",
            "height": 230,
            "width": 352,
            "date_captured": "2013-11-14 20:55:31",
            "flickr_url": "http://farm9.staticflickr.com/8429/7839199426_f6d48aa585_z.jpg",
            "id": 37777
        },
        {
            "license": 4,
            "file_name": "000000252219.jpg",
            "coco_url": "http://images.cocodataset.org/val2017/000000252219.jpg",
            "height": 428,
            "width": 640,
            "date_captured": "2013-11-14 22:32:02",
            "flickr_url": "http://farm4.staticflickr.com/3446/3232237447_13d84bd0a1_z.jpg",
            "id": 252219
        },

在这里插入图片描述
jupyter中看的效果：

images是一个列表，列表中每一个元素是一个字典，存储一张图片中的信息。分别就图片信息做出说明：

license: 没用
file_name:图片文件名
coco_url:没用
height:图片高
width:图片宽
date_captured:没用
flickr_url没用
id:图片的身份ID，每个图片特有的
在以上信息中，height,width,file_name,id这四个值非常重要。

2.1.4 annotations中的内容

该内容较多，列几条：

"annotations": [
        {
            "segmentation": [
                [
                    510.66,
                    423.01,
                    511.72,
                    ...
                    423.01,
                    510.45,
                    423.01
                ]
            ],
            "area": 702.1057499999998,
            "iscrowd": 0,
            "image_id": 289343,
            "bbox": [
                473.07,
                395.93,
                38.65,
                28.67
            ],
            "category_id": 18,
            "id": 1768
        },
        {
            "segmentation": [
                [
                    289.74,
                    443.39,
                    302.29,
                   ...
                    444.27,
                    291.88,
                    443.74
                ]
            ],
            "area": 27718.476299999995,
            "iscrowd": 0,
            "image_id": 61471,
            "bbox": [
                272.1,
                200.23,
                151.97,
                279.77
            ],
            "category_id": 18,
            "id": 1773
        },
        ......
                    "segmentation": {
                "counts": [
                    272,
                    2,
                    4,
                    4,
                   ...
                    16,
                    228,
                    8,
                    10250
                ],
                "size": [
                    240,
                    320
                ]
            },
            "area": 18419,
            "iscrowd": 1,
            "image_id": 448263,
            "bbox": [
                1,
                0,
                276,
                122
            ],
            "category_id": 1,
            "id": 900100448263
        },

在这里插入图片描述
jupyter中效果：

annotations是该JSON文件中最重要的。annotations是包含多个annotation实例的数组，annotation类型本身又包含一系列的字段：

segmentation:分割标签
area:面积
iscrowd: 是否多个目标
image_id:与images中的id对应
bbox:目标框
category_id:类别
id:标注框的一个序号
整体来说annotation的格式如下：

annotation{
    "segmentation": RLE or [polygon],
    "area" :float,
    "iscrowd": 0 or 1,
    "imgae_id": int,
    "bbox": [x,y,width,height],
    "category_id": int,
    "id": int

注意，单个对像（iscrowd=0)可能需要多个polygon来表示，比如这个对像在图像中被挡住;而iscrow=1时（将标注一组对像，比如一群人），segmentation的格式是RLE格式。也就是说，只要iscrowd=0，那么segmentation格式就是polygon; 而iscrowd=1,则segmentation格式是RLE。另外不论iscrowd是0还是1，每个对像都会有一个矩型框bbox,提供框的左上角坐标以及矩形框的高和宽。
segmentation polygon格式，可以看到，是一个二维的列表，里面的一堆数字是像素级分割得到的物体边缘坐标，从上文中也能看到，坐标是成对出现的；RLE格式如下：

segmentation : 
{
    'counts': [272, 2, 4, 4, 4, 4, 2, 9, 1, 2, 16, 43, 143, 24......], 
    'size': [240, 320]
}

COCO数据集的RLE都是uncompressed RLE格式（与之相对的是compact RLE）。 RLE所占字节的大小和边界上的像素数量是正相关的。RLE格式带来的好处就是当基于RLE去计算目标区域的面积以及两个目标之间的unoin和intersection时会非常有效率。上面的segmentation中的counts数组和size数组共同组成了这幅图片中的分割 mask。其中size是这幅图片的宽高，然后在这幅图像中，每一个像素点要么在被分割（标注）的目标区域中，要么在背景中。很明显这是一个bool量：如果该像素在目标区域中为true那么在背景中就是False；如果该像素在目标区域中为1那么在背景中就是0。对于一个240x320的图片来说，一共有76800个像素点，根据每一个像素点在不在目标区域中，我们就有了76800个bit，比如像这样（随便写的例子，和上文的数组没关系）：00000111100111110…；但是这样写很明显浪费空间，我们直接写上0或者1的个数不就行了嘛（Run-length encoding)，于是就成了54251…，这就是上文中的counts数组。

area指向该segmentation的面积，iscrowd=0表示没有重叠，iscrowd=1表示有重叠；image_id就是前面images中存储的id.bbox指向的是物体的标注框；category_id指向的数字代表分类，共有80个分类；id不同于images中的id,这里的id只是每个框的身份编号。

2.1.4 categories中的内容

如下:

"categories": [
        {
            "supercategory": "person",
            "id": 1,
            "name": "person"
        },
        {
            "supercategory": "vehicle",
            "id": 2,
            "name": "bicycle"
        },
        {
            "supercategory": "vehicle",
            "id": 3,
            "name": "car"
        },
        {
            "supercategory": "vehicle",
            "id": 4,
            "name": "motorcycle"
        },
        {
            "supercategory": "vehicle",
            "id": 5,
            "name": "airplane"
        },
        {
            "supercategory": "vehicle",
            "id": 6,
            "name": "bus"
        },
        {
            "supercategory": "vehicle",
            "id": 7,
            "name": "train"
        },
        {
            "supercategory": "vehicle",
            "id": 8,
            "name": "truck"
        },
        {
            "supercategory": "vehicle",
            "id": 9,
            "name": "boat"
        },
        {
            "supercategory": "outdoor",
            "id": 10,
            "name": "traffic light"
        },
        {
            "supercategory": "outdoor",
            "id": 11,
            "name": "fire hydrant"
        },
        {
            "supercategory": "outdoor",
            "id": 13,
            "name": "stop sign"
        },
        {
            "supercategory": "outdoor",
            "id": 14,
            "name": "parking meter"
        },
        {
            "supercategory": "outdoor",
            "id": 15,
            "name": "bench"
        },
        {
            "supercategory": "animal",
            "id": 16,
            "name": "bird"
        },
        {
            "supercategory": "animal",
            "id": 17,
            "name": "cat"
        },
        {
            "supercategory": "animal",
            "id": 18,
            "name": "dog"
        },
        {
            "supercategory": "animal",
            "id": 19,
            "name": "horse"
        },
        {
            "supercategory": "animal",
            "id": 20,
            "name": "sheep"
        },
        {
            "supercategory": "animal",
            "id": 21,
            "name": "cow"
        },
        {
            "supercategory": "animal",
            "id": 22,
            "name": "elephant"
        },
        {
            "supercategory": "animal",
            "id": 23,
            "name": "bear"
        },
        {
            "supercategory": "animal",
            "id": 24,
            "name": "zebra"
        },
        {
            "supercategory": "animal",
            "id": 25,
            "name": "giraffe"
        },
        {
            "supercategory": "accessory",
            "id": 27,
            "name": "backpack"
        },
        {
            "supercategory": "accessory",
            "id": 28,
            "name": "umbrella"
        },
        {
            "supercategory": "accessory",
            "id": 31,
            "name": "handbag"
        },
        {
            "supercategory": "accessory",
            "id": 32,
            "name": "tie"
        },
        {
            "supercategory": "accessory",
            "id": 33,
            "name": "suitcase"
        },
        {
            "supercategory": "sports",
            "id": 34,
            "name": "frisbee"
        },
        {
            "supercategory": "sports",
            "id": 35,
            "name": "skis"
        },
        {
            "supercategory": "sports",
            "id": 36,
            "name": "snowboard"
        },
        {
            "supercategory": "sports",
            "id": 37,
            "name": "sports ball"
        },
        {
            "supercategory": "sports",
            "id": 38,
            "name": "kite"
        },
        {
            "supercategory": "sports",
            "id": 39,
            "name": "baseball bat"
        },
        {
            "supercategory": "sports",
            "id": 40,
            "name": "baseball glove"
        },
        {
            "supercategory": "sports",
            "id": 41,
            "name": "skateboard"
        },
        {
            "supercategory": "sports",
            "id": 42,
            "name": "surfboard"
        },
        {
            "supercategory": "sports",
            "id": 43,
            "name": "tennis racket"
        },
        {
            "supercategory": "kitchen",
            "id": 44,
            "name": "bottle"
        },
        {
            "supercategory": "kitchen",
            "id": 46,
            "name": "wine glass"
        },
        {
            "supercategory": "kitchen",
            "id": 47,
            "name": "cup"
        },
        {
            "supercategory": "kitchen",
            "id": 48,
            "name": "fork"
        },
        {
            "supercategory": "kitchen",
            "id": 49,
            "name": "knife"
        },
        {
            "supercategory": "kitchen",
            "id": 50,
            "name": "spoon"
        },
        {
            "supercategory": "kitchen",
            "id": 51,
            "name": "bowl"
        },
        {
            "supercategory": "food",
            "id": 52,
            "name": "banana"
        },
        {
            "supercategory": "food",
            "id": 53,
            "name": "apple"
        },
        {
            "supercategory": "food",
            "id": 54,
            "name": "sandwich"
        },
        {
            "supercategory": "food",
            "id": 55,
            "name": "orange"
        },
        {
            "supercategory": "food",
            "id": 56,
            "name": "broccoli"
        },
        {
            "supercategory": "food",
            "id": 57,
            "name": "carrot"
        },
        {
            "supercategory": "food",
            "id": 58,
            "name": "hot dog"
        },
        {
            "supercategory": "food",
            "id": 59,
            "name": "pizza"
        },
        {
            "supercategory": "food",
            "id": 60,
            "name": "donut"
        },
        {
            "supercategory": "food",
            "id": 61,
            "name": "cake"
        },
        {
            "supercategory": "furniture",
            "id": 62,
            "name": "chair"
        },
        {
            "supercategory": "furniture",
            "id": 63,
            "name": "couch"
        },
        {
            "supercategory": "furniture",
            "id": 64,
            "name": "potted plant"
        },
        {
            "supercategory": "furniture",
            "id": 65,
            "name": "bed"
        },
        {
            "supercategory": "furniture",
            "id": 67,
            "name": "dining table"
        },
        {
            "supercategory": "furniture",
            "id": 70,
            "name": "toilet"
        },
        {
            "supercategory": "electronic",
            "id": 72,
            "name": "tv"
        },
        {
            "supercategory": "electronic",
            "id": 73,
            "name": "laptop"
        },
        {
            "supercategory": "electronic",
            "id": 74,
            "name": "mouse"
        },
        {
            "supercategory": "electronic",
            "id": 75,
            "name": "remote"
        },
        {
            "supercategory": "electronic",
            "id": 76,
            "name": "keyboard"
        },
        {
            "supercategory": "electronic",
            "id": 77,
            "name": "cell phone"
        },
        {
            "supercategory": "appliance",
            "id": 78,
            "name": "microwave"
        },
        {
            "supercategory": "appliance",
            "id": 79,
            "name": "oven"
        },
        {
            "supercategory": "appliance",
            "id": 80,
            "name": "toaster"
        },
        {
            "supercategory": "appliance",
            "id": 81,
            "name": "sink"
        },
        {
            "supercategory": "appliance",
            "id": 82,
            "name": "refrigerator"
        },
        {
            "supercategory": "indoor",
            "id": 84,
            "name": "book"
        },
        {
            "supercategory": "indoor",
            "id": 85,
            "name": "clock"
        },
        {
            "supercategory": "indoor",
            "id": 86,
            "name": "vase"
        },
        {
            "supercategory": "indoor",
            "id": 87,
            "name": "scissors"
        },
        {
            "supercategory": "indoor",
            "id": 88,
            "name": "teddy bear"
        },
        {
            "supercategory": "indoor",
            "id": 89,
            "name": "hair drier"
        },
        {
            "supercategory": "indoor",
            "id": 90,
            "name": "toothbrush"
        }
    ]

分类从1到90，但有些数字是跳过的，所以只有80个分类。
在这里插入图片描述

2.2 关键点检测Object Keypoint文件格式

COCO数据集中person_keypoints_train2017.json、person_keypoints_val2017.json这两个文件就是这种格式。文件整体格式是：

{
    "info": info,
    "licenses": [license],
    "images": [image],
    "annotations": [annotation],
    "categories": [category]
}

在这里插入图片描述

与instance_val2017.json相同。其中，info、licenses、images这三部分在不同的JSON文件中是相同的，定义是共享的，不共享的是annotations和category这两种在不同类型的JSON文件中是不一样的。

images字段下是一个列表，列表长度等同于划入训练集（或验证集）的图片数量
annotatons字段下也是一个列表，列表长度等同地训练集（或验证集）中bounding box 的数量，这里只有人这个类别的bounding box
categories字段下也是一个列表，列表长度等同于数据集类别的数，这里是1，只有person这一个类。
相同内容这里就不再列了，只列不同的。

2.2.1 annotations中的内容

这个类型中的annotation结构中包含 object instance中annotation所有的字段，再加上两个额外的字段。新增的keypoints是一个长度为3*k的数组，第一个和第二个元素分别是x和y坐标值，第三个是标志位v,v为0时表示这个关键点没有标注（这种情况下x=y=v=0），v为1时表示这个关键点标注了但是不可见（被遮挡了），v为2时表示这个关键点标注了同时也可见。num_keypoints表示这个目标上被标注的关键点的数量（v>0），比较小的目标上可能就无法标注关键点。

annotation{
   "segmentation": RLE or [polygon],
   "num_keypoints": int,    
   "area": float,
   "iscrowd": 0 or 1,
   "keypoints": [x1,y1,v1,...],
   "image_id": int,
   "bbox": [x,y,width,height],
   "category_id": int,
   "id": int 
}

列举一个：

{
      "segmentation": [
          [
              492.38,
              238.33,
              491.91,
              234.15,
              494.47,
              227.65,
              495.17,
              215.1,
              497.02,
              199.54,
              503.53,
              197.22,
              503.3,
              194.43,
              503.3,
              190.95,
              506.08,
              183.51,
              511.89,
              185.84,
              514.21,
              187,
              514.21,
              196.29,
              521.88,
              200.7,
              526.76,
              216.03,
              520.25,
              227.65,
              519.56,
              234.38,
              519.09,
              239.49,
              519.09,
              244.84,
              519.56,
              246.93,
              518.16,
              248.32,
              516.3,
              256.91,
              510.03,
              256.45,
              513.28,
              240.89
          ]
      ],
      "num_keypoints": 13,
      "area": 1394.7431,
      "iscrowd": 0,
      "keypoints": [
          508,
          192,
          2,
          510,
          191,
          2,
          506,
          191,
          2,
          512,
          192,
          2,
          503,
          192,
          1,
          515,
          202,
          2,
          499,
          202,
          2,
          524,
          214,
          2,
          497,
          215,
          2,
          516,
          226,
          2,
          496,
          224,
          2,
          511,
          232,
          2,
          497,
          230,
          2,
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0,
          0
      ],
      "image_id": 440475,
      "bbox": [
          491.91,
          183.51,
          34.85,
          73.4
      ],
      "category_id": 1,
      "id": 183302
  }

可以看到一共有17个关键点。
在这里插入图片描述

2.2.2 categories中的内容

对于category,相比object instance中的category,新增了两个字段，keypoints是一个长度为k的数组，包含每个关键点的名称；skeleton定义各关键点的连接性（比如人的左手腕和左肘就是连接的，但是左手腕和右手腕就不是）。目前，COCO的keypoints只标注了person category （分类为人）。定义如下：

{
    "supercategory": str,
    "id": int,
    "name": str,    
    "keypoints": [str],
    "skeleton": [edge]
}

具体的：

"categories": [
    {
        "supercategory": "person",
        "id": 1,
        "name": "person",
        "keypoints": [
            "nose",
            "left_eye",
            "right_eye",
            "left_ear",
            "right_ear",
            "left_shoulder",
            "right_shoulder",
            "left_elbow",
            "right_elbow",
            "left_wrist",
            "right_wrist",
            "left_hip",
            "right_hip",
            "left_knee",
            "right_knee",
            "left_ankle",
            "right_ankle"
        ],
        "skeleton": [
            [
                16,
                14
            ],
            [
                14,
                12
            ],
            [
                17,
                15
            ],
            [
                15,
                13
            ],
            [
                12,
                13
            ],
            [
                6,
                12
            ],
            [
                7,
                13
            ],
            [
                6,
                7
            ],
            [
                6,
                8
            ],
            [
                7,
                9
            ],
            [
                8,
                10
            ],
            [
                9,
                11
            ],
            [
                2,
                3
            ],
            [
                1,
                2
            ],
            [
                1,
                3
            ],
            [
                2,
                4
            ],
            [
                3,
                5
            ],
            [
                4,
                6
            ],
            [
                5,
                7
            ]
        ]
    }
]

在这里插入图片描述

2.3 看图说话Image Caption文件格式

captions_train2017.json、captions_val2017.json这两个文件就是这种格式。Image Caption这种格式的文件从头至尾按照顺序分为以下段落，看起来和Object Instance一样，不过没有最后的categories字段：

{
    "info": info,
    "licenses": [license],
    "images": [image],
    "annotations": [annotation]
}

其中，info、licenses、images这三个结构体/类型，在不同的JSON文件中这三个类型是一样的，定义是共享的。不共享的是annotations这种结构体，它在不同类型的JSON文件中是不一样的。

annotations: 数量要多于图片的数量，这是因为一个图片可以有多个场景描述；

2.3.1 annotation中的内容

这个类型中的annotation用来存储描述图片的语句。每个语句描述了对应图片的内容，而每个图片至少有5个描述语句（有的图片更多）。annotation定义如下：

annotation{
    "image_id": int,
    "id": int,
    "caption": str
}

取一个具体片段：

{
        "image_id": 546219,
        "id": 396378,
        "caption": "A large group is sitting together and eating at a restaurant."
    },
    {
        "image_id": 546219,
        "id": 397413,
        "caption": "The people are gathered at the table for dinner."
    },
    {
        "image_id": 146155,
        "id": 397604,
        "caption": "Two  men standing near a bar drinking together"
    },
    {
        "image_id": 546219,
        "id": 399732,
        "caption": "A large group of people pose for a photo at dinner."
    },
    {
        "image_id": 546219,
        "id": 400023,
        "caption": "The diners are enjoying their various beverages with their meals.."
    }