【PaddlePaddle+openvino】PP-OCRv2部署

转自AI Studio，原文链接：【PaddlePaddle+openvino】PP-OCRv2部署 - 飞桨AI StudioPaddlePaddle+openvino】PP-OCRv2部署OpenVINO™ 工具套件是用于快速开发应用程序和解决方案，以解决各种任务（包括人类视觉模拟、自动语音识别、自然语言处理和推荐系统等）的综合工具套件。该工具套件基于最新一代的人工神经网络，包括卷

AI Studio

1015人浏览 · 2022-04-28 23:04:52

AI Studio · 2022-04-28 23:04:52 发布

转自AI Studio，原文链接：【PaddlePaddle+openvino】PP-OCRv2部署 - 飞桨AI Studio

PaddlePaddle+openvino】PP-OCRv2部署

OpenVINO™ 工具套件是用于快速开发应用程序和解决方案，以解决各种任务（包括人类视觉模拟、自动语音识别、自然语言处理和推荐系统等）的综合工具套件。该工具套件基于最新一代的人工神经网络，包括卷积神经网络 (CNN)、递归网络和基于注意力的网络，可扩展跨英特尔® 硬件的计算机视觉和非视觉工作负载，从而最大限度地提高性能。它通过从边缘到云部署的高性能、人工智能和深度学习推理来为应用程序加速。

本项目将使用openvino部署PaddleOCR的PP-OCRv2模型（检测+分类+识别），快速体验paddlepaddle模型使用openvino部署流程。

先给出识别示例，原图（文字摘自小说【剑来】）：

上图检测结果：

上图识别结果：

一、PaddleOCR简介

PaddleOCR是一个超级火易用的OCR工具库，它有着诸多特性：
PP-OCR系列高质量预训练模型，准确的识别效果

超轻量PP-OCRv2系列：检测（3.1M）+ 方向分类器（1.4M）+ 识别（8.5M）= 13.0M
超轻量PP-OCR mobile移动端系列：检测（3.0M）+方向分类器（1.4M）+ 识别（5.0M）= 9.4M
通用PP-OCR server系列：检测（47.1M）+方向分类器（1.4M）+ 识别（94.9M）= 143.4M
支持中英文数字组合识别、竖排文本识别、长文本识别
支持多语言识别：韩语、日语、德语、法语等约80种语言 PP-Structure文档结构化系统
支持版面分析与表格识别（含Excel导出）
支持关键信息提取任务
支持DocVQA任务 丰富易用的OCR相关工具组件
半自动数据标注工具PPOCRLabel：支持快速高效的数据标注
数据合成工具Style-Text：批量合成大量与目标场景类似的图像
支持用户自定义训练，提供丰富的预测推理部署方案
支持PIP快速安装使用
可运行于Linux、Windows、MacOS等多种系统

二、文本检测模型部署

文本检测模型可参考【PaddlePaddle+openvino】PaddleOCR DB检测部署】，本文不再介绍。

三、方向分类模型

因为文本可能存在180°反转的情况，故需要一个分类模型判断文本是否反向，从而对其纠正，如下图，方向是反的：

纠正后：

下面开始具体的步骤，首先下载官方提供的分类模型：

!wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar
!tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar

解压得到静态图模型，使用netron查看模型结构：

记住上图的红线部分，是模型的输入维度，后续要把图片预处理成对应的维度。

In [ ]

# 下面给出分类模型的部署代码（不要在aistudio运行，可以在本地跑）
# 使用方法：python cls.py --image_path {path to image} --model_path {path to model}

import cv2
import openvino
import argparse
import numpy as np
import pyclipper
from openvino.runtime import Core
from shapely.geometry import Polygon


def normalize(im, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
    im = im.astype(np.float32, copy=False) / 255.0
    im -= mean
    im /= std
    return im
 
 
def resize(im, target_size=608, interp=cv2.INTER_LINEAR):
    if isinstance(target_size, list) or isinstance(target_size, tuple):
        w = target_size[0]
        h = target_size[1]
    else:
        w = target_size
        h = target_size
    im = cv2.resize(im, (w, h), interpolation=interp)
    return im
    

class ClsPostProcess(object):
    """ Convert between text-label and text-index """

    def __init__(self, label_list=['0', '180'], threshold=0.9):
        super(ClsPostProcess, self).__init__()
        self.label_list = label_list
        self.threshold = threshold

    def __call__(self, preds, image=None):
        pred_idxs = preds.argmax(axis=1)
        print(preds)
        assert pred_idxs.shape[0] == 1, "batch size must be 1, but got {}.".format(pred_idxs.shape[0])
        direction = self.label_list[pred_idxs[0]]
        if direction == '180' and preds[0, 1] > self.threshold:
            image = cv2.rotate(image, 1)
        return image
        

class ClsPredictor:
    def __init__(self, model_path, target_size=(100, 32), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], threshold=0.9):
        self.target_size = target_size
        self.mean = mean
        self.std = std
        self.model_path = model_path
        self.post_process = ClsPostProcess(threshold=threshold)
        
    def preprocess(self, image):
        image = resize(image, target_size=self.target_size)
        image = normalize(image, mean=self.mean, std=self.std)
        return image
        
    def predict(self, image):
        if isinstance(image, str):
            image = cv2.imread(image)
        image_h, image_w, _ = image.shape
        inputs = self.preprocess(image)
        input_image = np.expand_dims(
            inputs.transpose(2, 0, 1), 0
        )
        ie = Core()
        model = ie.read_model(model=self.model_path)
        compiled_model = ie.compile_model(model=model, device_name="CPU")
        input_layer_ir = next(iter(compiled_model.inputs))
        output_layer_ir = next(iter(compiled_model.outputs))
        
        preds = compiled_model([input_image])[output_layer_ir]
        image = self.post_process(preds, image)
        return image
        

def parse_args():
    parser = argparse.ArgumentParser(description='Model export.')
    # params of training
    parser.add_argument(
        '--model_path',
        dest='model_path',
        help='The path of pdmodel for export',
        type=str,
        default="ch_ppocr_mobile_v2.0_cls_infer/inference.pdmodel")
    parser.add_argument(
        '--image_path',
        dest='image_path',
        help='The path of image to predict.',
        type=str,
        default=None)
    return parser.parse_args()
        

if __name__ == "__main__":
    args = parse_args()
    model_path = args.model_path
    image_path = args.image_path

    cls_predictor = ClsPredictor(model_path, target_size=(100, 32), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], threshold=0.7)
    image = cls_predictor.predict(image_path)
    cv2.imwrite('cls_result.png', image)

四、文本识别模型

文本识别使用的是CRNN算法，论文：An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition，本文不设计原理部分。

首先下载官方提供的识别模型：

!wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar
!tar -xvf ch_PP-OCRv2_rec_infer.tar

解压得到静态图模型，使用netron查看模型结构：

可以看到识别模型的输入和方向分类的是一样的（因为方向分类后直接识别），输出的维度为[?, 25, 6625]，这里的？表示batch size，25表示识别的字符长度，6625是字符类别个数（识别模型有个对应的字典，字典内字符数量应该和字符类别一致）。

In [ ]

# 下面给出识别模型的部署代码  （不要在aistudio运行，在本地跑）
# 命令：python rec.py --image_path {image path} --model_path {model_path} --character_dict_path {dict path}

import cv2
import openvino
import argparse
import numpy as np
import pyclipper
from openvino.runtime import Core
from shapely.geometry import Polygon


def normalize(im, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
    im = im.astype(np.float32, copy=False) / 255.0
    im -= mean
    im /= std
    return im
 
 
def resize(im, target_size=608, interp=cv2.INTER_LINEAR):
    if isinstance(target_size, list) or isinstance(target_size, tuple):
        w = target_size[0]
        h = target_size[1]
    else:
        w = target_size
        h = target_size
    im = cv2.resize(im, (w, h), interpolation=interp)
    return im
    

class BaseRecLabelDecode(object):
    """ Convert between text-label and text-index """

    def __init__(self, character_dict_path=None, use_space_char=False):
        self.beg_str = "sos"
        self.end_str = "eos"

        self.character_str = []
        if character_dict_path is None:
            self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
            dict_character = list(self.character_str)
        else:
            with open(character_dict_path, "rb") as fin:
                lines = fin.readlines()
                for line in lines:
                    line = line.decode('utf-8').strip("\n").strip("\r\n")
                    self.character_str.append(line)
            if use_space_char:
                self.character_str.append(" ")
            dict_character = list(self.character_str)

        dict_character = self.add_special_char(dict_character)
        self.dict = {}
        for i, char in enumerate(dict_character):
            self.dict[char] = i
        self.character = dict_character

    def add_special_char(self, dict_character):
        return dict_character

    def decode(self, text_index, text_prob=None, is_remove_duplicate=False):
        """ convert text-index into text-label. """
        result_list = []
        ignored_tokens = self.get_ignored_tokens()
        batch_size = len(text_index)
        for batch_idx in range(batch_size):
            char_list = []
            conf_list = []
            for idx in range(len(text_index[batch_idx])):
                if text_index[batch_idx][idx] in ignored_tokens:
                    continue
                if is_remove_duplicate:
                    # only for predict
                    if idx > 0 and text_index[batch_idx][idx - 1] == text_index[
                            batch_idx][idx]:
                        continue

                char_list.append(self.character[int(text_index[batch_idx][
                    idx])])
                if text_prob is not None:
                    conf_list.append(text_prob[batch_idx][idx])
                else:
                    conf_list.append(1)
            text = ''.join(char_list)
            result_list.append((text, np.mean(conf_list)))
        return result_list

    def get_ignored_tokens(self):
        return [0]  # for ctc blank


class CTCLabelDecode(BaseRecLabelDecode):
    """ Convert between text-label and text-index """

    def __init__(self, character_dict_path=None, use_space_char=False,
                 **kwargs):
        super(CTCLabelDecode, self).__init__(character_dict_path,
                                             use_space_char)

    def __call__(self, preds, label=None, *args, **kwargs):
        if isinstance(preds, (tuple, list)):
            preds = preds[-1]
        preds_idx = preds.argmax(axis=2)
        preds_prob = preds.max(axis=2)
        text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True)
        if label is None:
            return text
        label = self.decode(label)
        return text, label

    def add_special_char(self, dict_character):
        dict_character = ['blank'] + dict_character
        return dict_character


class RecPredictor:
    def __init__(self, model_path, character_dict_path, target_size=(100, 32), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], use_space_char=False):
        self.target_size = target_size
        self.mean = mean
        self.std = std
        self.model_path = model_path
        self.post_process = CTCLabelDecode(character_dict_path=character_dict_path, use_space_char=use_space_char)
        
    def preprocess(self, image):
        image = resize(image, target_size=self.target_size)
        #cv2.imshow('rec', image)
        #cv2.waitKey(0)
        image = normalize(image, mean=self.mean, std=self.std)
        return image
        
    def predict(self, image):
        if isinstance(image, str):
            image = cv2.imread(image)
        image_h, image_w, _ = image.shape
        inputs = self.preprocess(image)
        input_image = np.expand_dims(
            inputs.transpose(2, 0, 1), 0
        )
        ie = Core()
        model = ie.read_model(model=self.model_path)
        compiled_model = ie.compile_model(model=model, device_name="CPU")
        input_layer_ir = next(iter(compiled_model.inputs))
        output_layer_ir = next(iter(compiled_model.outputs))
        
        preds = compiled_model([input_image])[output_layer_ir]
        text = self.post_process(preds)
        return text
        

def parse_args():
    parser = argparse.ArgumentParser(description='Model export.')
    # params of training
    parser.add_argument(
        '--model_path',
        dest='model_path',
        help='The path of pdmodel for export',
        type=str,
        default=None)
    parser.add_argument(
        '--image_path',
        dest='image_path',
        help='The path of image to predict.',
        type=str,
        default=None)
    parser.add_argument(
        '--use_space_char',
        dest='use_space_char',
        help='Whether use space char.',
        type=bool,
        default=True)
    parser.add_argument(
        '--character_dict_path',
        dest='character_dict_path',
        help='The path of character dict.',
        type=str,
        default="ppocr_keys_v1.txt")
    return parser.parse_args()
        

if __name__ == "__main__":
    args = parse_args()
    model_path = args.model_path
    image_path = args.image_path
    use_space_char = args.use_space_char
    character_dict_path = args.character_dict_path
    
    rec_predictor = RecPredictor(model_path, character_dict_path=character_dict_path, target_size=(100, 32), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], use_space_char=use_space_char)
    text = rec_predictor.predict(image_path)
    print(text)

五、串联部署

有了检测、方向分类和识别的部署代码，将其串起来即可实现文本检测啦。为了防止有的同学没法将代码跑通，可以在目录/home/aistudio下载提供的压缩包测试（ocr.zip）。

解压ocr.zip，进入ocr目录，输入命令：

 python .\ppocr_system.py --image_path test.png

result.png是检测结果，识别结果可以在命令行看到。如下：

[[('即随本心', 0.97602725)], [('春风不语', 0.97723883)], [('可问春风', 0.95982796)], [('遇事不决', 0.97903967)]]

In [ ]

# 串联部署示例代码，不要在aistudio 运行
# 命令：python system.py --image_path {path to your image}  #（其他参数自行添加）

import cv2
import openvino
import argparse
import numpy as np
import pyclipper
from openvino.runtime import Core
from shapely.geometry import Polygon
from ppocr_cls import ClsPredictor
from ppocr_det import DetPredictor
from ppocr_rec import RecPredictor
from PIL import Image, ImageDraw, ImageFont


class PaddleOCR:
    def __init__(self, det_model_path, rec_model_path, character_dict_path, cls_model_path=None, use_space_char=False, det_image_size=[960, 960], rec_image_size=[100, 32], mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
        self.det_predictor = DetPredictor(det_model_path, target_size=det_image_size, mean=mean, std=std)
        self.cls_predictor = ClsPredictor(cls_model_path, target_size=rec_image_size, mean=mean, std=std) if cls_model_path else None
        self.rec_predictor = RecPredictor(rec_model_path, character_dict_path=character_dict_path, target_size=rec_image_size, mean=mean, std=std, use_space_char=use_space_char)

    def predict(self, image_path):
        image = cv2.imread(image_path)
        raw_image = image.copy()
        boxes_batch = self.det_predictor.predict(image_path)
        draw_image = self.det_predictor.draw_det(image, boxes_batch[0]['points'])
        texts = []
        for box in boxes_batch[0]['points']:
            box = box.astype(np.int32)
            left, top = box[0, 0], box[0, 1]
            right, bottom = box[2, 0], box[2, 1]
            sub_image = raw_image[top:bottom, left:right, :]
            if self.cls_predictor is not None:
                sub_image = self.cls_predictor.predict(sub_image)
            text = self.rec_predictor.predict(sub_image)
            texts.append(text)
        return draw_image, texts
            

def parse_args():
    parser = argparse.ArgumentParser(description='Model export.')
    # params of training
    parser.add_argument(
        '--det_model_path',
        dest='det_model_path',
        help='The path of detection pdmodel for export',
        type=str,
        default='ch_PP-OCRv2_det_infer/inference.pdmodel')
    parser.add_argument(
        '--rec_model_path',
        dest='rec_model_path',
        help='The path of recognition pdmodel for export',
        type=str,
        default="ch_PP-OCRv2_rec_infer/inference.pdmodel")
    parser.add_argument(
        '--cls_model_path',
        dest='cls_model_path',
        help='The path of direction class pdmodel for export',
        type=str,
        default="ch_ppocr_mobile_v2.0_cls_infer/inference.pdmodel")
    parser.add_argument(
        '--image_path',
        dest='image_path',
        help='The path of image to predict.',
        type=str,
        default=None)
    parser.add_argument(
        '--save_path',
        dest='save_path',
        help='The image save path.',
        type=str,
        default="result.png")
    parser.add_argument(
        '--use_space_char',
        dest='use_space_char',
        help='Whether use space char.',
        type=bool,
        default=True)
    parser.add_argument(
        '--character_dict_path',
        dest='character_dict_path',
        help='The path of character dict.',
        type=str,
        default="ppocr_keys_v1.txt")
    return parser.parse_args()


if __name__ == "__main__":
    args = parse_args()
    
    predictor = PaddleOCR(det_model_path=args.det_model_path, rec_model_path=args.rec_model_path, character_dict_path=args.character_dict_path, cls_model_path=args.cls_model_path, use_space_char=args.use_space_char)
    draw_image, texts = predictor.predict(args.image_path)
    cv2.imwrite(args.save_path, draw_image)
    print(texts)