Skip to main content

Transcribe Conversation Node

The Transcribe Conversation node converts audio recordings or real-time speech into text transcripts with advanced features including speaker identification, sentiment analysis, entity detection, and chapter summarization. This node is ideal for creating meeting transcripts, interview documentation, lecture notes, and conversational analysis.

Transcribe Conversation node


Basic Usage

Use the Audio, Text, Transcribe Conversation, and Widget nodes for your process to create comprehensive transcription workflows.


Inputs

The Transcribe Conversation node accepts the following inputs:

Audio Input

  • Type: Audio file or audio stream (blue dot)
  • Mandatory: Required
  • Works best with: Audio node, File Upload, Microphone input

Provide the audio file or stream that you want to transcribe. Supports various audio formats including MP3, WAV, and other common audio formats.

Expected Number of People Who Speak

  • Type: Numeric value (green dot)
  • Mandatory: Optional
  • Works best with: Text node, Number input

Specify the expected number of speakers in the conversation. This helps the AI better identify and distinguish between different speakers in the transcript.


Outputs

JSON Output

  • Type: Structured JSON data (cyan dot)
  • Works best with: Widget, API Call, Data processing nodes

Contains the complete transcription data in JSON format, including:

  • Full transcript text
  • Speaker identification
  • Timestamps
  • Sentiment analysis results
  • Detected entities
  • Chapter summaries (if enabled)

Text Output

  • Type: Plain text transcript (green dot)
  • Works best with: Display Text, Document Download, Text processing nodes

Provides a simple text version of the transcription without additional metadata.


Configuration

Analysis Options

Configure what additional analysis should be performed on the transcription:

Produce Highlights Words

  • Type: Checkbox
  • Purpose: Identify and highlight key words or phrases in the transcript
  • Use case: Extract important points, keywords, or main topics from conversations

Sentiment Analysis

  • Type: Checkbox
  • Purpose: Analyze the emotional tone and sentiment of the conversation
  • Use case: Understand speaker emotions, customer satisfaction, or overall conversation mood

Detect Entities

  • Type: Checkbox
  • Purpose: Identify and extract named entities (people, places, organizations, dates, etc.)
  • Use case: Extract structured information like names, locations, dates, and organizations mentioned

Chapter Summary

  • Type: Checkbox
  • Purpose: Automatically divide the transcript into chapters with summaries
  • Use case: Create organized summaries for long conversations, meetings, or lectures

Example Workflows

Meeting Transcription with Analysis

Scenario: Transcribe a meeting recording with speaker identification and sentiment analysis, then display results in a widget.

Transcribe Conversation Example

Steps to Create the Flow:

  1. Start with the Start Node.

  2. Add an Audio node with your recording:

    • Upload an audio file (e.g., "sample-file.mp3")
    • Connect the Audio Output to Audio Input of Transcribe Conversation
  3. Add a Text node for the number of speakers:

    • Enter the expected number of speakers (e.g., "2")
    • Connect to Expected Number of People Who Speak input
  4. Configure the Transcribe Conversation Node:

    i. Enable Analysis Options as needed:

    • Check "Produce Highlights Words" to identify key points
    • Check "Sentiment Analysis" to analyze emotional tone
    • Check "Detect Entities" to extract names, dates, locations
    • Check "Chapter Summary" for long conversations
  5. Add a Widget node to display the results:

    i. Connect JSON Output to Config JSON input of the Widget

    • The widget will display the transcript with interactive features

    ii. Configure Widget display options:

    • Check "Show Full Width" for better readability
    • Optionally check "Disable Next Button" if needed
    • Check "Generate flow file with created config" to save configuration

    iii. Review Transcript Data:

    • The widget shows the JSON string of transcript data
    • Includes words and utterances with timestamps
    • Speaker identification and labels
    • Sentiment scores (if enabled)
    • Detected entities (if enabled)
  6. Alternative: Add Display Text to show plain text transcript:

    • Connect Text Output to Display Text Input
    • Provides a simple, readable transcript without metadata

Result:

Users receive:

  • A complete transcript with speaker identification
  • Highlighted keywords and important phrases
  • Sentiment analysis showing emotional tone
  • Detected entities (names, organizations, dates, locations)
  • Chapter summaries for easy navigation (if enabled)
  • Interactive widget display with all transcript features

Notes

  • Processing time depends on audio length and enabled features
  • Speaker identification accuracy improves with distinct voices
  • Sentiment analysis provides conversation-level and speaker-level insights
  • Entity detection extracts structured data from unstructured conversations
  • Chapter summaries help navigate long recordings efficiently
  • JSON output contains complete data including all enabled analysis features
  • Text output provides a clean, simple transcript without metadata