tess4j: Bad performance compared with direct use of Tesseract

Hi, I’m getting a bad performance using Tess4j in comparision with a direct use of Tesseract in the same machine and same resources.

I’m using Tess4j-5.1.1 and I have Tesseract v5.0.0-alpha.20210811 installed on my pc (Windows 10 - i7-7600U CPU @ 2.80GHz 2.90 GHz - 16GB RAM)

Here you have what I’m doing:

public static void main(String[ ] args) throws Exception {
		long t0, tf;

		File scannedPdf = new File("C:\\Users\\francesc.sola\\Desktop\\work_tesseract\\1_page.jpg");
		ITesseract instance = new Tesseract();  // JNA Interface Mapping
        //ITesseract instance = new Tesseract1(); // JNA Direct Mapping

        System.out.println("Using Tess4j-5.1.1");
		t0 = System.currentTimeMillis();
		instance.doOCR(scannedPdf);
		tf = System.currentTimeMillis();
		System.out.println("Process time: " + (tf - t0) + " ms.");

		System.out.println("Direct call to tesseract v5.0.0-alpha.20210811");
		String command = "tesseract.exe C:\\Users\\francesc.sola\\Desktop\\work_tesseract\\1_page.jpg C:\\Users\\francesc.sola\\Desktop\\work_tesseract\\out";

		// Running the above command
		Runtime run = Runtime.getRuntime();
		t0 = System.currentTimeMillis();
		Process proc = run.exec(command);
		proc.waitFor();
		tf = System.currentTimeMillis();

		System.out.println("Process time: " + (tf - t0) + " ms.");
		run.exit(0);
	}

And here you have the output:

Using Tess4j-5.1.1
Process time: 11010 ms.
Direct call to tesseract v5.0.0-alpha.20210811
Process time: 6589 ms.

As you can see, the use of Tess4j is incrementing considerably the time of process in comparison of direct call to Tesseract.

Any ideas about this behaviour? I attached the image what I’m testing

1_page

Thanks!

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

Would Tesseract1 API provide higher speeds? Either way, going through JNA would incur some overhead.

But it is probably mainly due to the fact that the DLL was compiled not using Enhanced Instruction Set. It was done so to maintain maximum compatibility among several generations of CPU. You can build a DLL with Enhanced Instruction Set enabled to match your CPU’s capability and set jna.library.path variable to load that instead.

This is my output on Windows system with Ryzen 7 5800X and 32GB RAM.

Using Tess4j-5.1.2
Process time: 2775 ms.
Direct call to tesseract v5.1.0
Process time: 4162 ms.