Introducing Tesseract 4 to Xojo
That brings us to the version 4.11 of Tesseract. We had plans to use the new version and looked for a way to make the transition easy for our plugin users. But since the newer tesseract library exports a C interface, we can load it dynamically at runtime. We got the new TessEngineMBS class for you and there we have a LoadLibrary() function to load the leptonica and tesseract libraries.
macOS with Homebrew
One way for macOS is to use the homebrew project to install the tesseract library with data files on a Mac.So after installing homebrew package manager via Terminal, you would use a command to install the packages like this:
And then you would do two TessEngineMBS.LoadLibrary function calls to load the libraries, first the leptonica image library and then the actual OCR library on top:brew install tesseract-lang
If both return true, you are good to go.Dim r1 as Boolean = TessEngineMBS.LoadLibrary( "/opt/homebrew/lib/liblept.5.dylib" )
Dim r2 as Boolean = TessEngineMBS.LoadLibrary( "/opt/homebrew/lib/libtesseract.4.dylib" )
macOS with our download
Or you go to our website where we have a disk image for you. This is a bit special as we provide you with one dylib for both libraries (leptonica and tesseract) as well as both architectures: Intel and ARM. So you put this dylib somewhere with the files somewhere and then you can load the plugin here:Once loaded, you are ready to go.Dim r as Boolean = TessEngineMBS.LoadLibrary( "/Users/Tesseract/tesseract.dylib")
Linux
On Linux with Ubuntu you can install the tesseract files via Terminal using the apt-get command:This should install all the dependencies and the tesseract package. Once that is done, you can simply load it:sudo apt-get install libtesseract4
Please notice that we don't pass a path since the libraries are installed in the default location for Linux, so the loader will find them automatically.Dim r1 as Boolean = TessEngineMBS.LoadLibrary( "liblept.so.5" )
Dim r2 as Boolean = TessEngineMBS.LoadLibrary( "libtesseract.so.4" )
If both return true, you are good to go.
Windows
On Windows you may use an installer for tesseract to get the data files and the DLLs into place. We got an installer for you from the University of Mannheim on our Download Libs folder.Once installed you can load it:
As you see we have to first switch the current working directory to the right folder. Then we load first the leptonica library and then the tesseract library. Since the tesseract one depends on the others, we load it first to have the DLL loader find it. But if all three functions returned true, you a good to go.Dim r1 as Boolean = TessEngineMBS.SetCurrentWorkingDirectory( "C:\Program Files\Tesseract-OCR") &
Dim r2 as Boolean = TessEngineMBS.LoadLibrary( "liblept-5.dll" )
Dim r3 as Boolean = TessEngineMBS.LoadLibrary( "libtesseract-4.dll" )
Initialize it
Once the library is loaded, you can use Initialize function to initialize the library. It's not part of the constructor, since you can create an object, set a few variables and then initialize. Pass the path to the language files, except on Linux or with homebrew, where it may go with the default location instead. When initializes, you can start the other functions. Basically you can just move this all to the app start code or first time you like to use OCR functions.New tricks
Since we got version 4, we added a few new functions: First SetImageData and SetImageFile allow you to pass image files and in-memory image data directly without going through picture object.And the newer engine can be initialized for multiple languages, e.g. "eng+deu" for loading both English and Deutsch (German).
Before loading library, you may use TessEngineMBS.LibraryLoaded function to know. So you only initialize on the first try.