itextsharp读中文乱码
https://www.bullzip.com/products/ext/info.php

https://github.com/search?o=desc&q=ai&s=stars&type=Repositories
https://github.com/fighting41love/funNLP

pdftotext.exe -layout -enc GBK -cfg add-to-xpdfrc 要读取的pdf文件路径 保存成txt文件路径
pdftotext.exe -layout -enc GBK -cfg add-to-xpdfrc 要读取的pdf文件路径 保存成txt文件路径

pdftotext.exe -layout -enc GBK -cfg add-to-xpdfrc a.pdf x.txt

pdftotext.exe -layout -enc GBK -cfg xpdfrc a.pdf x.txt

-enc GBK -cfg xpdfrc

pdftotext.exe -layout -enc EUC-CN a.pdf x.txt


pdftotext.exe -layout -enc GBK -nopgbrk a.pdf x.txt 

Spire.PDF
https://blog.csdn.net/hong0220/article/details/46503701
http://www.xpdfreader.com/opensource.html
https://bbs.csdn.net/forums/J2SE?category=2
http://www.verysource.com/cate_assembly-language/
压缩与解压

tomcat

代码编辑器
https://codemirror.net/
https://blog.csdn.net/qq_28537277/article/details/89705629
https://blog.csdn.net/admans/article/details/81584742
https://www.cnblogs.com/HIT-cyz/p/RichTextBox_LineNum_CYZ.html
命中率

阀值

BouncyCastle.Crypto.dll

java -jar pdfbox-app-1.3.1.jar ExtractText a.pdf a.txt
java -jar  pdfbox-app-3.0.0-RC1.jar  export:text -i a.pdf -o a.txt
java -jar pdfbox-app-2.0.24.jar PDFToImage L71-1.PDF test.png -imageType jpg -startPage 3 -endPage 3

C:\Program Files (x86)\Tesseract-OCR\tesseract a.jpg output_1 –l eng


tesseract a.jpg output_1 -l chi_sim_vert

https://www.cnblogs.com/insus/p/4323683.html

C:\Program Files (x86)\Tesseract-OCR

java -jar pdfbox-app-2.y.z.jar ExtractImages

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐