« MBS Xojo Plugins, ver… | Home | Smooth scrolling list… »

Improving our OCR functions for FileMaker Server

When we made our OCR functions back in September 2012, we designed them without reference numbers, so you can only have one current OCR job. Since FileMaker Pro always runs one script at a time (others are waiting), this is not a problem. But what happens on a server?

On a FileMaker server scripts may run in parallel. For a lot of customers this is no problem when occasionally a script runs, which may use OCR. But our active power users tend to do more and run scripts in parallel on multiple CPU cores. And there the mess starts with one script clearing the results of the other one since we hold the OCR state in a few global variables. You can mitigate this yourself with our Mutex functions, which allow you to have a script wait for another script to finish accessing a shared resource. While intended for our SharedMemory functions to synchronize memory access, it works fine to control access to OCR functions, too.

To fix the issue, we move to using per thread variables for our version 11.2 of the plugin. That means you can now have independent OCR setups on a FileMaker Server. Since FileMaker Server uses different threads to run scripts in parallel, you may have each script basically do the initialization, use OCR and do cleanup. If one script runs on first thread and initializes and later another script on second thread tries to access it, it may not see the initialization. You may detect this state by OCR.IsInitialized function.

You may split initialization into a start script and then just use OCR when needed. But recommended usage is now to simply use OCR.IsInitialized whenever you need OCR functions. If it returns 1, just continue, but otherwise do the initialization. This way you may enjoy having multiple scripts run one after each other in a thread and do initialization only the first time. When the thread ends, the plugin does the cleanup, so not calling OCR.Cleanup should not cause an issue.

The benefit of all this is that you can now use multiple CPUs to run multiple parallel scripts and do OCR in parallel. This can improve throughput easily by factor 4 when processing lots of images. You may try it with 11.2pr7 or newer version.

For FileMaker Pro this change should not have any effect since FileMaker Pro always uses one thread to run scripts. And it is unlikely you use OCR in any calculation which may run in threaded environment like portal loading.

In future we may do per thread variables for more things to separate scripts on server. And we may do more auto cleanup for thread end to avoid accidental memory leaks.

We will try to move to a newer tesseract version later this summer. Since we upgraded compilers, this should be possible now. But that will break your scripts, especially since we need new data files in tessdata folder. Not sure yet how to do the transition, but we may make new function names to intentional break your scripts and not have the new library try to load old files and crash. Let us know what you think about this.
11 05 21 - 14:07