Transcribe Conversation Node
The Transcribe Conversation node converts audio recordings or real-time speech into text transcripts with advanced features including speaker identification, sentiment analysis, entity detection, and chapter summarization. This node is ideal for creating meeting transcripts, interview documentation, lecture notes, and conversational analysis.

Basic Usage
Use the Audio, Text, Transcribe Conversation, and Widget nodes for your process to create comprehensive transcription workflows.
Inputs
The Transcribe Conversation node accepts the following inputs:
Audio Input
- Type: Audio file or audio stream (blue dot)
- Mandatory: Required
- Works best with: Audio node, File Upload, Microphone input
Provide the audio file or stream that you want to transcribe. Supports various audio formats including MP3, WAV, and other common audio formats.
Expected Number of People Who Speak
- Type: Numeric value (green dot)
- Mandatory: Optional
- Works best with: Text node, Number input
Specify the expected number of speakers in the conversation. This helps the AI better identify and distinguish between different speakers in the transcript.
Outputs
JSON Output
- Type: Structured JSON data (cyan dot)
- Works best with: Widget, API Call, Data processing nodes
Contains the complete transcription data in JSON format, including:
- Full transcript text
- Speaker identification
- Timestamps
- Sentiment analysis results
- Detected entities
- Chapter summaries (if enabled)
Text Output
- Type: Plain text transcript (green dot)
- Works best with: Display Text, Document Download, Text processing nodes
Provides a simple text version of the transcription without additional metadata.
Configuration
Analysis Options
Configure what additional analysis should be performed on the transcription:
Produce Highlights Words
- Type: Checkbox
- Purpose: Identify and highlight key words or phrases in the transcript
- Use case: Extract important points, keywords, or main topics from conversations
Sentiment Analysis
- Type: Checkbox
- Purpose: Analyze the emotional tone and sentiment of the conversation
- Use case: Understand speaker emotions, customer satisfaction, or overall conversation mood
Detect Entities
- Type: Checkbox
- Purpose: Identify and extract named entities (people, places, organizations, dates, etc.)
- Use case: Extract structured information like names, locations, dates, and organizations mentioned
Chapter Summary
- Type: Checkbox
- Purpose: Automatically divide the transcript into chapters with summaries
- Use case: Create organized summaries for long conversations, meetings, or lectures
Example Workflows
Meeting Transcription with Analysis
Scenario: Transcribe a meeting recording with speaker identification and sentiment analysis, then display results in a widget.

Steps to Create the Flow:
-
Start with the Start Node.
-
Add an Audio node with your recording:
- Upload an audio file (e.g., "sample-file.mp3")
- Connect the Audio Output to Audio Input of Transcribe Conversation
-
Add a Text node for the number of speakers:
- Enter the expected number of speakers (e.g., "2")
- Connect to Expected Number of People Who Speak input
-
Configure the Transcribe Conversation Node:
i. Enable Analysis Options as needed:
- Check "Produce Highlights Words" to identify key points
- Check "Sentiment Analysis" to analyze emotional tone
- Check "Detect Entities" to extract names, dates, locations
- Check "Chapter Summary" for long conversations
-
Add a Widget node to display the results:
i. Connect JSON Output to Config JSON input of the Widget
- The widget will display the transcript with interactive features
ii. Configure Widget display options:
- Check "Show Full Width" for better readability
- Optionally check "Disable Next Button" if needed
- Check "Generate flow file with created config" to save configuration
iii. Review Transcript Data:
- The widget shows the JSON string of transcript data
- Includes words and utterances with timestamps
- Speaker identification and labels
- Sentiment scores (if enabled)
- Detected entities (if enabled)
-
Alternative: Add Display Text to show plain text transcript:
- Connect Text Output to Display Text Input
- Provides a simple, readable transcript without metadata
Result:
Users receive:
- A complete transcript with speaker identification
- Highlighted keywords and important phrases
- Sentiment analysis showing emotional tone
- Detected entities (names, organizations, dates, locations)
- Chapter summaries for easy navigation (if enabled)
- Interactive widget display with all transcript features
Notes
- Processing time depends on audio length and enabled features
- Speaker identification accuracy improves with distinct voices
- Sentiment analysis provides conversation-level and speaker-level insights
- Entity detection extracts structured data from unstructured conversations
- Chapter summaries help navigate long recordings efficiently
- JSON output contains complete data including all enabled analysis features
- Text output provides a clean, simple transcript without metadata