Transcribe Conversation Node

The Transcribe Conversation node converts audio recordings or real-time speech into text transcripts with advanced features including speaker identification, sentiment analysis, entity detection, and chapter summarization. This node is ideal for creating meeting transcripts, interview documentation, lecture notes, and conversational analysis.

Transcribe Conversation node

Basic Usage

Use the Audio, Text, Transcribe Conversation, and Widget nodes for your process to create comprehensive transcription workflows.

Inputs

The Transcribe Conversation node accepts the following inputs:

Audio Input

Type: Audio file or audio stream (blue dot)
Mandatory: Required
Works best with: Audio node, File Upload, Microphone input

Provide the audio file or stream that you want to transcribe. Supports various audio formats including MP3, WAV, and other common audio formats.

Expected Number of People Who Speak

Type: Numeric value (green dot)
Mandatory: Optional
Works best with: Text node, Number input

Specify the expected number of speakers in the conversation. This helps the AI better identify and distinguish between different speakers in the transcript.

Outputs

JSON Output

Type: Structured JSON data (cyan dot)
Works best with: Widget, API Call, Data processing nodes

Contains the complete transcription data in JSON format, including:

Full transcript text
Speaker identification
Timestamps
Sentiment analysis results
Detected entities
Chapter summaries (if enabled)

Text Output

Type: Plain text transcript (green dot)
Works best with: Display Text, Document Download, Text processing nodes

Provides a simple text version of the transcription without additional metadata.

Configuration

Analysis Options

Configure what additional analysis should be performed on the transcription:

Produce Highlights Words

Type: Checkbox
Purpose: Identify and highlight key words or phrases in the transcript
Use case: Extract important points, keywords, or main topics from conversations

Sentiment Analysis

Type: Checkbox
Purpose: Analyze the emotional tone and sentiment of the conversation
Use case: Understand speaker emotions, customer satisfaction, or overall conversation mood

Detect Entities

Type: Checkbox
Purpose: Identify and extract named entities (people, places, organizations, dates, etc.)
Use case: Extract structured information like names, locations, dates, and organizations mentioned

Chapter Summary

Type: Checkbox
Purpose: Automatically divide the transcript into chapters with summaries
Use case: Create organized summaries for long conversations, meetings, or lectures

Example Workflows

Meeting Transcription with Analysis

Scenario: Transcribe a meeting recording with speaker identification and sentiment analysis, then display results in a widget.

Transcribe Conversation Example

Steps to Create the Flow:

Start with the Start Node.
Add an Audio node with your recording:
- Upload an audio file (e.g., "sample-file.mp3")
- Connect the Audio Output to Audio Input of Transcribe Conversation
Add a Text node for the number of speakers:
- Enter the expected number of speakers (e.g., "2")
- Connect to Expected Number of People Who Speak input
Configure the Transcribe Conversation Node:

i. Enable Analysis Options as needed:
- Check "Produce Highlights Words" to identify key points
- Check "Sentiment Analysis" to analyze emotional tone
- Check "Detect Entities" to extract names, dates, locations
- Check "Chapter Summary" for long conversations
Add a Widget node to display the results:

i. Connect JSON Output to Config JSON input of the Widget
- The widget will display the transcript with interactive features
ii. Configure Widget display options:
- Check "Show Full Width" for better readability
- Optionally check "Disable Next Button" if needed
- Check "Generate flow file with created config" to save configuration
iii. Review Transcript Data:
- The widget shows the JSON string of transcript data
- Includes words and utterances with timestamps
- Speaker identification and labels
- Sentiment scores (if enabled)
- Detected entities (if enabled)
Alternative: Add Display Text to show plain text transcript:
- Connect Text Output to Display Text Input
- Provides a simple, readable transcript without metadata

Result:

Users receive:

A complete transcript with speaker identification
Highlighted keywords and important phrases
Sentiment analysis showing emotional tone
Detected entities (names, organizations, dates, locations)
Chapter summaries for easy navigation (if enabled)
Interactive widget display with all transcript features

Notes

Processing time depends on audio length and enabled features
Speaker identification accuracy improves with distinct voices
Sentiment analysis provides conversation-level and speaker-level insights
Entity detection extracts structured data from unstructured conversations
Chapter summaries help navigate long recordings efficiently
JSON output contains complete data including all enabled analysis features
Text output provides a clean, simple transcript without metadata

Basic Usage​

Inputs​

Audio Input​

Expected Number of People Who Speak​

Outputs​

JSON Output​

Text Output​

Configuration​

Analysis Options​

Produce Highlights Words​

Sentiment Analysis​

Detect Entities​

Chapter Summary​

Example Workflows​

Meeting Transcription with Analysis​

Notes​