Sometimes you have to loop over characters in a string in Xojo. Whether you count something, search for patterns or you want to replace some characters, the reasons are divers, but performance may matter.
Let's try four different ways and report how much time is needed for each function.
My test file is about 300,000 characters big, stored as UTF-8 and contains various German umlauts, so we have a couple of two byte characters. All tests are made with DisableBackgroundTasks pragma set to reduce background activity. For the timing we run each block 10 times to get an average duration.
First way is using String.Characters, which does create an iterator over the characters. Basically it creates an Iterable object and converts String to Text internally. Then creates an iterator object, where the for loop internally calls MoveNext and Value functions, which includes wrapping the string for each character into a variant. Here is the loop:
In my test this takes about 550 ms to run over 300,000 characters of text. Let's see if we can do better.
We call String.Split with an empty string as delimiter to split by characters. So the function walks over the text, looks where characters begin and end and copies them into new strings and adds them to an array. Then we traverse that array with a for each loop:
In our test this takes about 110 ms on the same text.
We add StringCodePointsMBS for version 21.3 of MBS Xojo DataTypes Plugin. This function returns an array with UInt32 representing the code points. We skip creating the string objects to save some time here, but we can handle correctly unicode characters above 65535, which won't fit in 16 bit integers.
In our test this takes about 51 ms per run.
The fastest way is to not bother about unicode characters and just look on the bytes. By converting string to Memoryblock, the bytes are copied and you can travers the new memory block like this:
This takes about 50 ms, just a bit faster than our plugin function. But please try it with 😀, where you would get a 4 byte memory block for one character.