« Exclude your Xojo win… | Home | Use Xojo for Linux fo… »

How to Use Whisper in Xojo to Transcribe Audio Files

Transcribing speech into text has never been more accessible, thanks to models like OpenAI’s Whisper. In this post, we’ll walk through how to use the MBS Xojo Plugins in Xojo to transcribe audio files using Whisper.

We’ll go step-by-step from loading required libraries to getting the final transcription.

Step 1: Load the Whisper Dynamic Library

Before using Whisper with our WhisperMBS module, we need to load the appropriate dynamic library (.dylib) that matches your Whisper version.

1. load the dylib for the whisper version you have. Var appFile As FolderItem = app.ExecutableFile Var macOSFolder As FolderItem = appfile.Parent Var appContents As FolderItem = macOSFolder.parent Var Frameworks As FolderItem = appContents.Child("Frameworks") Var LibFile As FolderItem = Frameworks.Child("libwhisper.1.7.4.dylib")

Ensure the library exists before continuing:

If Not LibFile.Exists Then Log LibFile.name + " file missing?" Quit ElseIf WhisperMBS.LoadLibrary(LibFile) Then 'Log "Okay" Else Log "Failed to load library: "+WhisperMBS.LoadErrorMessage Quit End If

Step 2: Load the libsndfile Library

We also need a library to read audio files like WAV or FLAC. The libsndfile.dylib is ideal for this and can be downloaded from our Libs page.

// 2. Load sndfile library // https://www.monkeybreadsoftware.de/xojo/download/plugin/Libs/ LibFile = Frameworks.Child("libsndfile.dylib") If SoundFileMBS.LoadLibrary(LibFile) Then 'Log "Okay" Else Log "Failed to load library: "+SoundFileMBS.LoadErrorMessage Quit End If

Step 3: Load the Audio File

You can now open your audio file using SoundFileMBS. Make sure to adjust the file path accordingly.

// 3. Load audio file. // Please change path! Var f As FolderItem = GetFolderItem("ep306_16kHz_16bit.wav") // SoundFileMBS is in our Tools plugin Var s As SoundFileMBS = SoundFileMBS.Open(f) If s = Nil Then Log "Failed to open sound." Quit End If

Read the audio frames into memory:

Var info As SoundFileInfoMBS = s.Info System.DebugLog Str(info.Frames)+" frames, "+Str(info.SampleRate)+" Hz." Var samples As New MemoryBlock(info.Frames * 4) Var SamplesCount As Integer = s.ReadSingleFrames(samples, info.Frames)

Step 4: Resample Audio (If Needed)

Whisper expects audio sampled at 16 kHz. If your file uses a different rate, you'll need to resample it. For this we picked the speexdsp library, which we installed via homebrew, but it could also be bundled with the application like sndfile above.

// 4. If needed, convert the audio to 16000 Hz as that is what our If info.SampleRate <> 16000 Then // we need 16 Khz Const LibPath = "/opt/homebrew/Cellar/speexdsp/1.2.1/lib/libspeexdsp.1.dylib" // SpeexResamplerState *speex_resampler_init(spx_uint32_t nb_channels, spx_uint32_t in_rate, spx_uint32_t out_rate, int quality, int *err); Soft Declare Function speex_resampler_init Lib LibPath (Channels As UInt32, InRate As UInt32, OutRate As UInt32, Quality As Int32, ByRef error As Int32) As Ptr Var inputRate As Integer = info.SampleRate Var outputRate As Integer = 16000 Var InputLength As UInt32 = SamplesCount Var OutputLength As UInt32 = InputLength * outputRate / inputRate Var output As New MemoryBlock(OutputLength * 4) Const SPEEX_RESAMPLER_QUALITY_BEST = 10 Var error As Int32 Var resampler As Ptr = speex_resampler_init(1, inputRate, outputRate, SPEEX_RESAMPLER_QUALITY_BEST, error) If error <> 0 Then Log "Speex resampler init failed" Break Return End If // int speex_resampler_process_float(SpeexResamplerState *st, spx_uint32_t channel_index, Const float *In, spx_uint32_t *in_len, float *out, spx_uint32_t *out_len); Soft Declare Function speex_resampler_process_float Lib LibPath (resampler As Ptr, ChannelIndex As UInt32, Input As Ptr, ByRef InLen As UInt32, Output As Ptr, ByRef OutLen As UInt32) As Int32 Var SamplesPtr As Ptr = samples Var outputPtr As Ptr = output Call speex_resampler_process_float(resampler, 0, SamplesPtr, InputLength, outputPtr, OutputLength) Soft Declare Sub speex_resampler_destroy Lib LibPath (resampler As Ptr) // void speex_resampler_destroy(SpeexResamplerState *st); speex_resampler_destroy(resampler) // now use resampled data samples = output SamplesCount = OutputLength end if

Step 5: Run Whisper and Transcribe

With audio data loaded and resampled (if needed), we can now use Whisper to transcribe the content.

// 5. Use Whisper to convert audio to text // now convert System.DebugLog WhisperMBS.LangMaxID.ToString+" languages" Var Resources As FolderItem = appContents.Child("Resources") Var ModelFile As FolderItem = Resources.Child("ggml-base.en.bin") // you may need to change this to point to your file

Set up context and parameters:

Var cparams As New WhisperContextParamsMBS Var wparams As New WhisperFullParamsMBS(WhisperFullParamsMBS.SamplingStrategyGreedy) wparams.TdrzEnable = True Var context As New WhisperContext(ModelFile, cparams) Var samplesPtr As Ptr = samples Var e As Integer = context.full(wparams, samplesPtr, SamplesCount)

Check for errors and extract the segments:

If e <> 0 Then Log "Failed to process audio. Error: "+e.ToString Quit Else Log "Error: "+e.ToString End If Var segments As Integer = context.FullSegments Var lines() As String lines.add "Text: "

Loop over segments and collect texts

We loop over the segments. For each segment we get:

  • Get text of segment
  • Get token objects for the segment
  • Get token data objects for the segment with more details
  • Get token texts for the segment
  • Whether the speaker changed
  • Loop over tokens and ask for each token object

On the end we collect the segment texts and show them on the end.

For SegmentIndex As Integer = 0 To Segments-1 Var tokenCount As Integer = context.FullTokens(SegmentIndex) Var tokens() As WhisperTokenMBS = context.FullGetTokens(SegmentIndex) Var tokenDatas() As WhisperTokenDataMBS = context.FullGetTokenDatas(SegmentIndex) Var tokenTexts() As String = context.FullGetTokenTexts(SegmentIndex) Var speakerHasTurned As Boolean = context.FullGetSegmentSpeakerTurnNext(SegmentIndex) System.DebugLog "speakerHasTurned: "+speakerHasTurned.ToString Var Text As String = context.FullSegmentText(SegmentIndex) For TokenIndex As Integer = 0 To tokenCount -1 Var token As WhisperTokenMBS = context.FullGetToken(SegmentIndex, TokenIndex) Var tokenData As WhisperTokenDataMBS = context.FullGetTokenData(SegmentIndex, TokenIndex) objects.Add token objects.Add tokenData Next System.DebugLog Text lines.add Text Next MessageBox "Finished: "+Join(lines, EndOfLine)

Please try and see if you can make use of the Whisper library to transcribe text within Xojo.

The biggest plugin in space...
24 07 25 - 08:46