Transcribe Speech Node
The Transcribe Speech node converts audio input into text using a selected OpenAI Whisper model and reference text. This powerful node enables accurate speech-to-text conversion, making it ideal for transcription services, voice input processing, accessibility features, and language learning applications.

Basic Usage
Use Speech Input Node, Display Text, Text, and Transcribe Speech Node for your process.
Inputs
The Transcribe Speech node accepts the following inputs:
- Input: The audio file or recording to be transcribed into text.
- Reference (Optional): Reference text that can help improve transcription accuracy, especially for specialized vocabulary, names, or technical terms.
Outputs
- Output: The transcribed text from the audio input.
Configuration
Model Selection
Select the OpenAI Whisper model to use for transcription:
- OpenAI Whisper: Advanced speech recognition model supporting multiple languages and accents
- High accuracy for various audio qualities
- Supports 99+ languages
- Handles background noise and multiple speakers
- Automatic language detection
Available Models
- OpenAI Text to Speech: General-purpose text-to-speech model by OpenAI.
- OpenAI Alloy: A distinct voice option from OpenAI.
- OpenAI Fable: A voice option from OpenAI designed for storytelling.
- OpenAI Nova: A unique voice option from OpenAI.
- OpenAI Onyx: A further voice option from OpenAI.
- 11LAB Sue: A high-fidelity voice model from 11LABS.
- 11LAB Katie: Another voice model from 11LABS.
- 11LAB Alexander: A third voice model from 11LABS.
Usage
Setting Up Transcribe Speech
- Add the node to your flow canvas.
- Select Model: Choose OpenAI Whisper for speech transcription.
- Connect Input: Link from upstream nodes (e.g., Speech Input Node, File Upload) to provide the audio content.
- Connect Reference (Optional): Link reference text to improve accuracy for specialized content.
- Connect Output: Link to downstream nodes (e.g., Display Text, Text Join, AI General Prompt) to use the transcribed text.
Example Workflows
Convert audio input into Note Transcription
Scenario: Create a voice note transcription tool that converts spoken recordings into written text for easy review and editing.

Steps to Create the Flow:
-
Add a Start Node.
-
Add and connect a Speech Input Node for audio recording.
- Configure to allow voice recording
- Set appropriate recording duration limits
-
Add and connect a Transcribe Speech Node.
- Select Model: OpenAI Whisper
- Connect the Speech Input Node's Output to Input
-
Add and connect a Display Text node to show the transcribed text.
Result: Users can record voice notes and immediately see them transcribed into text, making it easy to capture thoughts and ideas verbally.