Skip to main content

Audio Input

The Audio Input node allows users to record audio directly through their microphone or upload existing audio files during workflow execution. Unlike the Audio node which stores static audio files, the Audio Input node captures user-generated audio in real-time, making it essential for voice responses, speech assessments, audio submissions, and interactive voice-based activities.

Audio Input


Basic Usage

Use the Audio Input node to enable users to record audio through their microphone or upload audio files as part of the workflow interaction.


Inputs

The Audio Input node does not accept inputs from other nodes. It collects audio directly from users during flow execution.


Outputs

Audio Out (Blue Port)

Audio Data: Outputs the recorded or uploaded audio file.

  • Connects to blue input ports of audio-processing nodes
  • Passes the audio file/data to connected nodes
  • Can be used for transcription, analysis, or storage
  • Available after user completes recording or upload

Audio File URL (Green Port)

File URL: Outputs the URL of the recorded/uploaded audio file.

  • Connects to green input ports of text-compatible nodes
  • Provides the URL as text string
  • Can be stored, displayed, or used in API calls
  • Useful for referencing the audio file location

Configuration

The Audio Input node operates without configuration - it automatically provides recording and upload capabilities to users when the flow executes.

User Interface (Runtime)

When users reach this node during flow execution, they see:

Recording Interface:

  • Microphone button to start/stop recording
  • Recording timer
  • Playback controls to review recording
  • Option to re-record

Upload Interface:

  • Button to upload existing audio files
  • File browser to select audio files
  • Supported formats: MP3, WAV, M4A, OGG, WebM, etc.

Example Workflows

Audio Transcription Workflow

Scenario: Record user audio and convert it to text using AI transcription.

Audio Input Example

Steps to Create the Flow:

  1. Add a Start Node.

  2. Add an Audio Input node:

    • No configuration needed
    • User will record or upload audio when flow runs
  3. Add a Transcribe Speech node:

    i. Configure transcription model:

    • Model Dropdown: Select "OpenAI Whisper" (or other available models)
    • Language Detection: Select "Auto Detect" (or specific language)

    ii. Connect audio:

    • Audio Input "Audio Out" (blue) → Transcribe Speech "Input" (blue)
  4. Add a Display Text node:

    • Connect: Transcribe Speech "Output" (green) → Display Text "Input" (green)
    • This shows the transcribed text to the user
  5. Connect flow control:

    • Start → Audio Input (red to red)
    • Audio Input → Transcribe Speech (red to red)
    • Transcribe Speech → Display Text (red to red)

Preview:

[Start] → [Audio Input: User records audio]
→ [Transcribe Speech: Convert to text]
→ [Display Text: Show transcript]

Result: User records audio through their microphone, the AI transcribes it to text, and the transcript is displayed.


  • Transcribe Speech: Converts audio to text (primary processing node)
  • Audio: Static audio files (different from user input)
  • Play Sound: Plays audio back to users
  • Text to Speech: Generates audio from text (opposite direction)
  • Speech Input: Alternative audio capture node
  • Display Text: Shows instructions and transcripts
  • AI General Feedback: Analyzes transcribed content
  • Data Dump: Stores audio URLs and metadata

Audio Input vs Other Audio Nodes

FeatureAudio InputAudio NodeSpeech Input
PurposeUser records/uploadsStatic audio filesUser voice recording
Audio SourceUser microphone/filePre-uploaded by creatorUser microphone
When CapturedDuring flow executionBefore flow creationDuring flow execution
Use CaseAssignments, responsesLessons, instructionsVoice responses
File ManagementRuntime captureStored in workflowRuntime capture
OutputsAudio + URLAudio dataAudio data

When to use Audio Input:

  • Student audio submissions
  • Voice-based assessments
  • Speech practice and recording
  • Audio feedback collection
  • Voice interviews
  • Dictation and notes

When to use Audio Node:

  • Pre-recorded instructions
  • Music and sound effects
  • Lecture recordings
  • Static audio content

When to use Speech Input:

  • Simple voice recording
  • Quick audio capture
  • Speech-to-text only
  • Voice commands

Summary

The Audio Input node enables powerful voice-based interactions:

Interactive: Real-time user audio capture
Flexible: Recording or file upload options
Integrated: Works with transcription and AI nodes
Accessible: Multiple output formats (audio + URL)
Versatile: Supports diverse voice-based workflows

Master the Audio Input node to create engaging voice-based assessments, language learning activities, and interactive audio experiences that enhance user engagement and enable new forms of interaction.