代码地址:
https://github.com/RuoyiDu/PMG-Progressive-Multi-Granularity-Training
Motivation:
less effort has been placed to which granularities are the most discriminative and how to fuse information cross multi-granularity.
关键技术点:
(i) a progressive training strategy that effectively fuses features from different granularities,
(ii) a random jigsaw patch generator that encourages the network to learn features at specific granularities.
贡献点:
We propose a novel progressive training strategy to solve for fine-grained classification. It operates in different training steps, and at each step fuses data from previous levels of granularity, ultimately cultivating the inherent complementary properties across different granularities for fine-grained feature learning.
We adapt a simple yet effective jigsaw puzzle generator to form different levels of granularity. This allows the network to focus on different \scales" of features as per prior work.
在这里插入图片描述

渐进式训练
思路来源于GAN,从低分辨率再加层慢慢提高分辨率。
本文中的方法如下:
在这里插入图片描述

Fig. 2. The training procedure of the progressive training which consists of S + 1 steps at each iteration (Here S = 3 for explanation). The Conv Block represents the combination of two convolution layers with and max pooling layer, and Classifier represent two fully connected layers with a softmax layer at the end. At each iteration, the training data are augmented by the jigsaw generator and sequentially input into the network by S + 1 steps. In our training process, the hyper-parameter n is 2L−l+1 for the lth stage. At each step, the output from the corresponding classifier will be used for loss computation and parameter updating.
首先把输入的图像分成不同的小的patches,去训练一个较低层次的的模型,然后逐步增大patch的大小,并将高层也引入到训练中。
拼图器主要是一个数据增强的手段
网络结构:
可以使用任何的流行的backbone网络F,有L个stages
HLconv 用于降低特征维度,只考虑最后的S个stages(3),然后做concat
较低stage的感受野和特征表达能力受到了限制,因此网络需要从局部细节上定位判别性区域
好处:
在这里插入图片描述

拼图生成器:
这个在下面这篇文章中被证实了它非常适合于自监督的任务
Iterative reorganization with weak spatial constraints: Solving arbitrary jigsaw puzzles for
unsupervised representation learning
本文中用它来设计不同粒度的区域,并迫使模型能够学习具体的信息,这个具体的信息能够去响应不同的粒度在每一个训练步骤中~
给定一张图像,我们平等地把它分成n*n个patches,并且这些patch会被打乱并合并成一张新的图像,因此,他们粒度就得到了控制~
注意满足两个条件:(1)patch的大小需要比感受野要小(2)patch的size应该随着感受野的增大而增大
最大的好处就是:能够使模型在不同的粒度下去发现最显著的判别性区域~
Inference
这个阶段不需要拼图生成器,预测阶段使用多阶段的预测结果:
Hence, both the prediction of yconcat and multi-output combined prediction can be obtained in our model. In addition, although all predictions are complementary for final result, yconcat is enough for those objects whose shapes are relatively smooth, for example, cars
实验结果
实验设置:
在这里插入图片描述
实验结果:
在鸟,车,飞机,狗上做实验在这里插入图片描述
消融实验结果:
在这里插入图片描述
在这里插入图片描述

可视化结果:
在这里插入图片描述

可以借鉴的点:
最好的思想就是那个拼图生成器,对食品来说应该也是有用的!
知识补充:
saliency localization:显著区域定位

点击阅读全文
Logo

快速构建 Web 应用程序

更多推荐