Skip to main content

Describe Image Node

The Describe Image node provides descriptions for given images using reference text and selected AI models. This powerful node enables image-to-text conversion, making it ideal for generating image captions, accessibility descriptions, content analysis, and automated image documentation.

Describe Image node


Basic Usage

Use Text tool, Image Input, Display Text, and Describe Image Node for your process.


Inputs

The Describe Image node accepts the following inputs:

Input

  • Type: Image to be described
  • Mandatory: Required
  • Works best with: Image Input, File Upload, Generate Image output

Reference

  • Type: Reference text to guide the description style or focus
  • Mandatory: Optional
  • Works best with: Text Input, Text node

Use reference text to specify what aspects of the image to focus on or what style of description is needed.

Overwrite System Prompt

  • Type: Custom system prompt to replace default behavior
  • Mandatory: Optional
  • Works best with: Text Input, Text node

Use this to completely customize how the AI analyzes and describes images.


Outputs

Output

  • Type: Text description of the image
  • Works best with: Display Text, Document Download, AI General Prompt

The output provides a detailed description of the image based on the selected model and any reference text provided.


Configuration

Model Selection

Select the AI model to use for image description. The node supports a wide range of vision-capable models:

GPT-4o (Default in screenshot)

GPT-4o is OpenAI's advanced multimodal model with excellent vision capabilities for detailed and accurate image descriptions.

Available Models

Vision Models:

  • Mistral OCR: Specialized for optical character recognition and text extraction from images
  • GPT-5 Nano: Compact model for quick image analysis
  • Gemini 2.5 Flash Lite: Google's lightweight vision model for fast descriptions
  • GPT-4o-mini: Efficient model for general image descriptions
  • GPT-5 Mini: Compact OpenAI model with vision capabilities
  • Claude 3 Haiku: Anthropic's fast and efficient vision model
  • Claude 3.5 Haiku: Enhanced version of Claude Haiku
  • Gemini 2.5 Flash: Google's fast vision model
  • GPT-5 Codex: Specialized for code and technical diagrams
  • Gemini 2.5 Pro: Google's professional-grade vision model

Advanced Models:

  • GPT-5: Advanced OpenAI model with superior vision understanding
  • GPT-5 Chat Latest: Latest conversational model with vision
  • Grok 4: Advanced vision and reasoning capabilities
  • Claude Sonnet 4: Anthropic's balanced vision model
  • Claude Sonnet 4.5: Enhanced Sonnet with improved vision
  • GPT-4o: OpenAI's flagship multimodal model
  • Claude Opus 4.1: Anthropic's most powerful vision model
  • Claude Opus 4: High-capability vision analysis
  • Claude 3 Opus: Anthropic's advanced vision model
  • GPT-5 Pro: Professional-grade vision analysis
  • GPT-4.5 Preview: Preview of next-generation GPT vision
  • GPT-4 Vision: OpenAI's vision-specialized model
  • GPT-4.1 Mini: Compact advanced vision model
  • GPT-4.1 Nano: Ultra-compact vision model
  • Mathpix OCR: Specialized for mathematical notation and equations

Example Workflows

Accessibility Image Descriptions

Scenario: Generate detailed accessibility descriptions for images in educational content.

Describe Image Example

Steps to Create the Flow:

  1. Add a Start Node.

  2. Add and connect an Image Input or File Upload for the image to describe.

  3. Add and connect a Text node with reference instructions.

    • Example reference text:
    Provide a detailed accessibility description suitable for screen readers. Include:
    - Main subject and composition
    - Colors and visual elements
    - Text content if present
    - Spatial relationships
    - Important details for understanding
  4. Add and connect a Describe Image Node.

    • Select Model: GPT-4o for detailed descriptions
    • Connect Image Input to Input
    • Connect Text node to Reference
  5. Add and connect a Display Text to show the description.

Result: Images are automatically described with detailed, accessibility-friendly text suitable for screen readers and visually impaired users.