Feel no regret for this life.

tesseract支持中文识别

Python爬虫 silianpan 48℃ 0评论

下载中文识别库

https://github.com/tesseract-ocr/tessdata

放到tesseract安装目录share目录下

mac:/usr/local/Cellar/tesseract/4.0.0_1/share

安装pytesseract

pip install pytesseract

测试

import tesserocr
import pytesseract
from PIL import Image

image = Image.open('code4.png')

image = image.convert('L')
threshold = 127
table = []
for i in range(256):
    if i < threshold:
        table.append(0)
    else:
        table.append(1)

image = image.point(table, '1')
image.show()

# result = tesserocr.image_to_text(image)
result = pytesseract.image_to_string(Image.open('code5.png'), lang='chi_sim')
print(result)

转载请注明:溜爸 » tesseract支持中文识别

喜欢 (0)
发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址