【PaddlePaddle+openvino】PP-OCRv2部署
转自AI Studio,原文链接:【PaddlePaddle+openvino】PP-OCRv2部署 - 飞桨AI StudioPaddlePaddle+openvino】PP-OCRv2部署OpenVINO™ 工具套件是用于快速开发应用程序和解决方案,以解决各种任务(包括人类视觉模拟、自动语音识别、自然语言处理和推荐系统等)的综合工具套件。该工具套件基于最新一代的人工神经网络,包括卷
转自AI Studio,原文链接:【PaddlePaddle+openvino】PP-OCRv2部署 - 飞桨AI Studio
PaddlePaddle+openvino】PP-OCRv2部署
OpenVINO™ 工具套件是用于快速开发应用程序和解决方案,以解决各种任务(包括人类视觉模拟、自动语音识别、自然语言处理和推荐系统等)的综合工具套件。该工具套件基于最新一代的人工神经网络,包括卷积神经网络 (CNN)、递归网络和基于注意力的网络,可扩展跨英特尔® 硬件的计算机视觉和非视觉工作负载,从而最大限度地提高性能。它通过从边缘到云部署的高性能、人工智能和深度学习推理来为应用程序加速。
本项目将使用openvino部署PaddleOCR的PP-OCRv2模型(检测+分类+识别),快速体验paddlepaddle模型使用openvino部署流程。
先给出识别示例,原图(文字摘自小说【剑来】):
上图检测结果:
上图识别结果:
一、PaddleOCR简介
PaddleOCR是一个超级火易用的OCR工具库,它有着诸多特性:
PP-OCR系列高质量预训练模型,准确的识别效果
- 超轻量PP-OCRv2系列:检测(3.1M)+ 方向分类器(1.4M)+ 识别(8.5M)= 13.0M
- 超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
- 通用PP-OCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
- 支持中英文数字组合识别、竖排文本识别、长文本识别
- 支持多语言识别:韩语、日语、德语、法语等约80种语言 PP-Structure文档结构化系统
- 支持版面分析与表格识别(含Excel导出)
- 支持关键信息提取任务
- 支持DocVQA任务 丰富易用的OCR相关工具组件
- 半自动数据标注工具PPOCRLabel:支持快速高效的数据标注
- 数据合成工具Style-Text:批量合成大量与目标场景类似的图像
支持用户自定义训练,提供丰富的预测推理部署方案
支持PIP快速安装使用
可运行于Linux、Windows、MacOS等多种系统
二、文本检测模型部署
文本检测模型可参考【PaddlePaddle+openvino】PaddleOCR DB检测部署】,本文不再介绍。
三、方向分类模型
因为文本可能存在180°反转的情况,故需要一个分类模型判断文本是否反向,从而对其纠正,如下图,方向是反的:
纠正后:
下面开始具体的步骤,首先下载官方提供的分类模型:
!wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar
!tar -xvf ch_ppocr_mobile_v2.0_cls_infer.tar
解压得到静态图模型,使用netron查看模型结构:
记住上图的红线部分,是模型的输入维度,后续要把图片预处理成对应的维度。
In [ ]
# 下面给出分类模型的部署代码(不要在aistudio运行,可以在本地跑)
# 使用方法:python cls.py --image_path {path to image} --model_path {path to model}
import cv2
import openvino
import argparse
import numpy as np
import pyclipper
from openvino.runtime import Core
from shapely.geometry import Polygon
def normalize(im, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
im = im.astype(np.float32, copy=False) / 255.0
im -= mean
im /= std
return im
def resize(im, target_size=608, interp=cv2.INTER_LINEAR):
if isinstance(target_size, list) or isinstance(target_size, tuple):
w = target_size[0]
h = target_size[1]
else:
w = target_size
h = target_size
im = cv2.resize(im, (w, h), interpolation=interp)
return im
class ClsPostProcess(object):
""" Convert between text-label and text-index """
def __init__(self, label_list=['0', '180'], threshold=0.9):
super(ClsPostProcess, self).__init__()
self.label_list = label_list
self.threshold = threshold
def __call__(self, preds, image=None):
pred_idxs = preds.argmax(axis=1)
print(preds)
assert pred_idxs.shape[0] == 1, "batch size must be 1, but got {}.".format(pred_idxs.shape[0])
direction = self.label_list[pred_idxs[0]]
if direction == '180' and preds[0, 1] > self.threshold:
image = cv2.rotate(image, 1)
return image
class ClsPredictor:
def __init__(self, model_path, target_size=(100, 32), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], threshold=0.9):
self.target_size = target_size
self.mean = mean
self.std = std
self.model_path = model_path
self.post_process = ClsPostProcess(threshold=threshold)
def preprocess(self, image):
image = resize(image, target_size=self.target_size)
image = normalize(image, mean=self.mean, std=self.std)
return image
def predict(self, image):
if isinstance(image, str):
image = cv2.imread(image)
image_h, image_w, _ = image.shape
inputs = self.preprocess(image)
input_image = np.expand_dims(
inputs.transpose(2, 0, 1), 0
)
ie = Core()
model = ie.read_model(model=self.model_path)
compiled_model = ie.compile_model(model=model, device_name="CPU")
input_layer_ir = next(iter(compiled_model.inputs))
output_layer_ir = next(iter(compiled_model.outputs))
preds = compiled_model([input_image])[output_layer_ir]
image = self.post_process(preds, image)
return image
def parse_args():
parser = argparse.ArgumentParser(description='Model export.')
# params of training
parser.add_argument(
'--model_path',
dest='model_path',
help='The path of pdmodel for export',
type=str,
default="ch_ppocr_mobile_v2.0_cls_infer/inference.pdmodel")
parser.add_argument(
'--image_path',
dest='image_path',
help='The path of image to predict.',
type=str,
default=None)
return parser.parse_args()
if __name__ == "__main__":
args = parse_args()
model_path = args.model_path
image_path = args.image_path
cls_predictor = ClsPredictor(model_path, target_size=(100, 32), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], threshold=0.7)
image = cls_predictor.predict(image_path)
cv2.imwrite('cls_result.png', image)
四、文本识别模型
文本识别使用的是CRNN算法,论文:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition,本文不设计原理部分。
首先下载官方提供的识别模型:
!wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar
!tar -xvf ch_PP-OCRv2_rec_infer.tar
解压得到静态图模型,使用netron查看模型结构:
可以看到识别模型的输入和方向分类的是一样的(因为方向分类后直接识别),输出的维度为[?, 25, 6625],这里的?表示batch size,25表示识别的字符长度,6625是字符类别个数(识别模型有个对应的字典,字典内字符数量应该和字符类别一致)。
In [ ]
# 下面给出识别模型的部署代码 (不要在aistudio运行,在本地跑)
# 命令:python rec.py --image_path {image path} --model_path {model_path} --character_dict_path {dict path}
import cv2
import openvino
import argparse
import numpy as np
import pyclipper
from openvino.runtime import Core
from shapely.geometry import Polygon
def normalize(im, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
im = im.astype(np.float32, copy=False) / 255.0
im -= mean
im /= std
return im
def resize(im, target_size=608, interp=cv2.INTER_LINEAR):
if isinstance(target_size, list) or isinstance(target_size, tuple):
w = target_size[0]
h = target_size[1]
else:
w = target_size
h = target_size
im = cv2.resize(im, (w, h), interpolation=interp)
return im
class BaseRecLabelDecode(object):
""" Convert between text-label and text-index """
def __init__(self, character_dict_path=None, use_space_char=False):
self.beg_str = "sos"
self.end_str = "eos"
self.character_str = []
if character_dict_path is None:
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
else:
with open(character_dict_path, "rb") as fin:
lines = fin.readlines()
for line in lines:
line = line.decode('utf-8').strip("\n").strip("\r\n")
self.character_str.append(line)
if use_space_char:
self.character_str.append(" ")
dict_character = list(self.character_str)
dict_character = self.add_special_char(dict_character)
self.dict = {}
for i, char in enumerate(dict_character):
self.dict[char] = i
self.character = dict_character
def add_special_char(self, dict_character):
return dict_character
def decode(self, text_index, text_prob=None, is_remove_duplicate=False):
""" convert text-index into text-label. """
result_list = []
ignored_tokens = self.get_ignored_tokens()
batch_size = len(text_index)
for batch_idx in range(batch_size):
char_list = []
conf_list = []
for idx in range(len(text_index[batch_idx])):
if text_index[batch_idx][idx] in ignored_tokens:
continue
if is_remove_duplicate:
# only for predict
if idx > 0 and text_index[batch_idx][idx - 1] == text_index[
batch_idx][idx]:
continue
char_list.append(self.character[int(text_index[batch_idx][
idx])])
if text_prob is not None:
conf_list.append(text_prob[batch_idx][idx])
else:
conf_list.append(1)
text = ''.join(char_list)
result_list.append((text, np.mean(conf_list)))
return result_list
def get_ignored_tokens(self):
return [0] # for ctc blank
class CTCLabelDecode(BaseRecLabelDecode):
""" Convert between text-label and text-index """
def __init__(self, character_dict_path=None, use_space_char=False,
**kwargs):
super(CTCLabelDecode, self).__init__(character_dict_path,
use_space_char)
def __call__(self, preds, label=None, *args, **kwargs):
if isinstance(preds, (tuple, list)):
preds = preds[-1]
preds_idx = preds.argmax(axis=2)
preds_prob = preds.max(axis=2)
text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True)
if label is None:
return text
label = self.decode(label)
return text, label
def add_special_char(self, dict_character):
dict_character = ['blank'] + dict_character
return dict_character
class RecPredictor:
def __init__(self, model_path, character_dict_path, target_size=(100, 32), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], use_space_char=False):
self.target_size = target_size
self.mean = mean
self.std = std
self.model_path = model_path
self.post_process = CTCLabelDecode(character_dict_path=character_dict_path, use_space_char=use_space_char)
def preprocess(self, image):
image = resize(image, target_size=self.target_size)
#cv2.imshow('rec', image)
#cv2.waitKey(0)
image = normalize(image, mean=self.mean, std=self.std)
return image
def predict(self, image):
if isinstance(image, str):
image = cv2.imread(image)
image_h, image_w, _ = image.shape
inputs = self.preprocess(image)
input_image = np.expand_dims(
inputs.transpose(2, 0, 1), 0
)
ie = Core()
model = ie.read_model(model=self.model_path)
compiled_model = ie.compile_model(model=model, device_name="CPU")
input_layer_ir = next(iter(compiled_model.inputs))
output_layer_ir = next(iter(compiled_model.outputs))
preds = compiled_model([input_image])[output_layer_ir]
text = self.post_process(preds)
return text
def parse_args():
parser = argparse.ArgumentParser(description='Model export.')
# params of training
parser.add_argument(
'--model_path',
dest='model_path',
help='The path of pdmodel for export',
type=str,
default=None)
parser.add_argument(
'--image_path',
dest='image_path',
help='The path of image to predict.',
type=str,
default=None)
parser.add_argument(
'--use_space_char',
dest='use_space_char',
help='Whether use space char.',
type=bool,
default=True)
parser.add_argument(
'--character_dict_path',
dest='character_dict_path',
help='The path of character dict.',
type=str,
default="ppocr_keys_v1.txt")
return parser.parse_args()
if __name__ == "__main__":
args = parse_args()
model_path = args.model_path
image_path = args.image_path
use_space_char = args.use_space_char
character_dict_path = args.character_dict_path
rec_predictor = RecPredictor(model_path, character_dict_path=character_dict_path, target_size=(100, 32), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], use_space_char=use_space_char)
text = rec_predictor.predict(image_path)
print(text)
五、串联部署
有了检测、方向分类和识别的部署代码,将其串起来即可实现文本检测啦。 为了防止有的同学没法将代码跑通,可以在目录/home/aistudio下载提供的压缩包测试(ocr.zip)。
解压ocr.zip,进入ocr目录,输入命令:
python .\ppocr_system.py --image_path test.png
result.png是检测结果,识别结果可以在命令行看到。如下:
[[('即随本心', 0.97602725)], [('春风不语', 0.97723883)], [('可问春风', 0.95982796)], [('遇事不决', 0.97903967)]]
In [ ]
# 串联部署示例代码,不要在aistudio 运行
# 命令:python system.py --image_path {path to your image} #(其他参数自行添加)
import cv2
import openvino
import argparse
import numpy as np
import pyclipper
from openvino.runtime import Core
from shapely.geometry import Polygon
from ppocr_cls import ClsPredictor
from ppocr_det import DetPredictor
from ppocr_rec import RecPredictor
from PIL import Image, ImageDraw, ImageFont
class PaddleOCR:
def __init__(self, det_model_path, rec_model_path, character_dict_path, cls_model_path=None, use_space_char=False, det_image_size=[960, 960], rec_image_size=[100, 32], mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
self.det_predictor = DetPredictor(det_model_path, target_size=det_image_size, mean=mean, std=std)
self.cls_predictor = ClsPredictor(cls_model_path, target_size=rec_image_size, mean=mean, std=std) if cls_model_path else None
self.rec_predictor = RecPredictor(rec_model_path, character_dict_path=character_dict_path, target_size=rec_image_size, mean=mean, std=std, use_space_char=use_space_char)
def predict(self, image_path):
image = cv2.imread(image_path)
raw_image = image.copy()
boxes_batch = self.det_predictor.predict(image_path)
draw_image = self.det_predictor.draw_det(image, boxes_batch[0]['points'])
texts = []
for box in boxes_batch[0]['points']:
box = box.astype(np.int32)
left, top = box[0, 0], box[0, 1]
right, bottom = box[2, 0], box[2, 1]
sub_image = raw_image[top:bottom, left:right, :]
if self.cls_predictor is not None:
sub_image = self.cls_predictor.predict(sub_image)
text = self.rec_predictor.predict(sub_image)
texts.append(text)
return draw_image, texts
def parse_args():
parser = argparse.ArgumentParser(description='Model export.')
# params of training
parser.add_argument(
'--det_model_path',
dest='det_model_path',
help='The path of detection pdmodel for export',
type=str,
default='ch_PP-OCRv2_det_infer/inference.pdmodel')
parser.add_argument(
'--rec_model_path',
dest='rec_model_path',
help='The path of recognition pdmodel for export',
type=str,
default="ch_PP-OCRv2_rec_infer/inference.pdmodel")
parser.add_argument(
'--cls_model_path',
dest='cls_model_path',
help='The path of direction class pdmodel for export',
type=str,
default="ch_ppocr_mobile_v2.0_cls_infer/inference.pdmodel")
parser.add_argument(
'--image_path',
dest='image_path',
help='The path of image to predict.',
type=str,
default=None)
parser.add_argument(
'--save_path',
dest='save_path',
help='The image save path.',
type=str,
default="result.png")
parser.add_argument(
'--use_space_char',
dest='use_space_char',
help='Whether use space char.',
type=bool,
default=True)
parser.add_argument(
'--character_dict_path',
dest='character_dict_path',
help='The path of character dict.',
type=str,
default="ppocr_keys_v1.txt")
return parser.parse_args()
if __name__ == "__main__":
args = parse_args()
predictor = PaddleOCR(det_model_path=args.det_model_path, rec_model_path=args.rec_model_path, character_dict_path=args.character_dict_path, cls_model_path=args.cls_model_path, use_space_char=args.use_space_char)
draw_image, texts = predictor.predict(args.image_path)
cv2.imwrite(args.save_path, draw_image)
print(texts)
六、参考连接
更多推荐
所有评论(0)