【JAVA】Tesseract-OCR截图屏幕指定区域识别0.2.2
- 步骤
- 1. Tesseract-OCR下载地址
- 2. 添加maven依赖
- 3. 下载中文语言包
- 4. 正式代码
2024年12月20日19:53:57----0.2.2
2024年12月23日11:59:16----0.3.2
步骤
1. Tesseract-OCR下载地址
https://tesseract-ocr.github.io/tessdoc/Downloads.html
2. 添加maven依赖
<dependencies><dependency><groupId>net.sourceforge.tess4j</groupId><artifactId>tess4j</artifactId><version>4.5.4</version></dependency></dependencies>
3. 下载中文语言包
https://gitcode.com/open-source-toolkit/99b98/blob/main/tessdata.rar
下载后将rar解压放到类似这样的目录D:\Program Files\Tesseract-OCR\tessdata
如果没有该语言包,但是代码里有instance.setLanguage(“chi_sim”),就会报错
* Warning: Invalid resolution 0 dpi. Using 70 instead.* Exception in thread "main" java.lang.Error: Invalid memory access* at com.sun.jna.Native.invokePointer(Native Method)* at com.sun.jna.Function.invokePointer(Function.java:497)* at com.sun.jna.Function.invoke(Function.java:441)* at com.sun.jna.Function.invoke(Function.java:361)* at com.sun.jna.Library$Handler.invoke(Library.java:265)* at com.sun.proxy.$Proxy0.TessBaseAPIGetUTF8Text(Unknown Source)* at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:517)* at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:359)* at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:228)* at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:195)* at Main.main(Main.java:23)
4. 正式代码
下面这个代码就是从指定区域识别字符,然后从字符里找出数字
import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;import javax.imageio.ImageIO;
import java.awt.*;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;public class Main {public static void main(String[] args) {clipPic();System.out.println(getFinalNumber());}public static String findNumber(String input){String findResult="";// 定义正则表达式模式,用于匹配数字Pattern pattern = Pattern.compile("\\d+");// 创建Matcher对象,用于对输入字符串进行匹配操作Matcher matcher = pattern.matcher(input);// 通过循环查找并输出所有匹配的数字while (matcher.find()) {findResult=matcher.group();}return findResult;}public static void clipPic(){try {// 创建一个Robot对象Robot robot = new Robot();// 指定截图区域Rectangle screenRect = new Rectangle(1920, 0, 150, 43); // x, y, width, height// 截图BufferedImage screenFullImage = robot.createScreenCapture(screenRect);// 保存图片ImageIO.write(screenFullImage, "png", new File("D:\\日志\\123.png"));System.out.println("截图已保存为 screenshot.png");} catch (AWTException | IOException ex) {System.err.println("截图失败:" + ex);}}public static String getFinalNumber(){String finalResult="";// 初始化Tesseract实例ITesseract instance = new Tesseract();try {// 设置Tesseract的语言库路径(根据实际情况修改)instance.setDatapath("D:\\Program Files\\Tesseract-OCR\\tessdata");// 设置识别语言,默认为英文,中文简体设置为"chi_sim"instance.setLanguage("chi_sim");// 设置tessedit_unrej_any_wd参数,示例设为true,允许识别任何未被拒绝的单词instance.setTessVariable("tessedit_unrej_any_wd", "true");// 设置textord_force_make_prop_words参数,强制创建合理的单词,有助于中文词汇组合识别instance.setTessVariable("textord_force_make_prop_words", "true");// 设置tessedit_create_hocr参数,生成hOCR格式输出,利于文本位置和排版相关分析instance.setTessVariable("tessedit_create_hocr", "true");// 设置language_model_ngram_score参数,这里设置一个示例值,需根据实际测试调整instance.setTessVariable("language_model_ngram_score", "0.6");// 读取待识别的图片文件File imageFile = new File("D:\\日志\\123.png");// 执行OCR识别finalResult = findNumber(instance.doOCR(imageFile));return finalResult;} catch (TesseractException e) {System.err.println(e.getMessage());}return finalResult;}
}