Evaluating AI Results

Why Evaluate AI Output?

Generative AI can be astonishingly creative and helpful, but it can also make mistakes, misunderstand what you want, or produce content that’s off-topic, biased, or just plain wrong.
In LearningFlow, every AI-powered node reflects your prompt AND the underlying model's strengths & limitations, so careful review is essential.

Treat AI output as a draft from an enthusiastic assistant. You’re the editor and final judge!

Common Issues and What to Watch For

1. Hallucinations (AI “Making Things Up”)

Generative AI can sometimes invent details, facts, or references that sound plausible but are simply not true. This is called hallucination.

What does hallucination look like?

Fictitious Facts:
“Marie Curie discovered penicillin in 1912.” (False penicillin was discovered by Alexander Fleming in 1928.)
Imaginary Quotes:
“As Galileo famously said, ‘The stars are the windows to the soul.’” (No evidence Galileo ever said this.)
Citing Nonexistent Studies or Books:
“According to the Journal of Modern Math, 2017…” (This source may not exist.)
Incorrect or Broken URLs and Links:
“For more information, visit https://en.wikipedia.org/spaceology” (This page likely doesn’t exist.)
- The AI often generates fake Wikipedia articles, YouTube links, or random web pages that look real but go nowhere.
- Example:
  “Watch this lesson: https://youtube.com/watch?v=abcd1234” (Completely invented.)

Special Note for Educators: Always Check Links!

If an AI-generated resource, video, or reference includes a URL:

Click and verify it yourself.
If linking for students, ensure it leads to appropriate, accurate content.
If the destination does not exist, find and substitute a real resource or remove the link.

Other Subtle Hallucinations:

Timeline errors: Combining dates, events, or people out of sequence
Word definitions: Making up plausible-but-false word meanings, especially for rare or specialized terms
Steps in procedures: Skipping, repeating, or inventing steps that don’t really apply

Why does this happen?

The AI tries to sound helpful and relevant, “filling in” details even when it doesn’t truly know the answer or lacks access to real-time information.

Bottom line: Treat every fact, statistic, citation, or link from the AI as a “suggestion” to check! A quick verification step keeps your teaching trustworthy and your students safe from misinformation.

2. Inappropriate or Biased Content

Because models are trained on internet data, they can occasionally produce content that is culturally insensitive, overly adult, or biased.
Always check that output aligns with classroom and organizational values.

3. Irrelevant or Off-Topic Answers

If the prompt is unclear, the AI may generate responses unrelated to your goals or your learners' needs.

4. Lack of Depth or Detail

AI can sometimes give answers that are too shallow or generic, especially for complex subjects.

5. Repetitiveness

The same phrasing, structure, or ideas may show up repeatedly, especially if your prompt is repetitive or too general.

6. Missed Format

If you asked for a particular format (bullet points, table, role play) and didn’t get it, review your prompt for clarity.

Key Principles for Evaluating Output

Ask yourself:

Is the content accurate?
Double-check facts, numbers, and even definitions if you’re not 100% sure.
Is it clear and appropriate for the intended audience?
Adjust reading level, vocabulary, cultural references, etc.
Does it match your instructions and required format?
Is any bias, cultural insensitivity, or inappropriate material present?
Substitute or edit as needed for your context.
Does it add value, or just fill space?
Challenge generic or shallow outputs to go deeper or be more engaging.

Real-World Pitfalls (and How to Fix Them)

Problem	Example	What to Do
Factually incorrect	“Australia’s capital is Sydney.”	Correct and consider rephrasing prompt
Out-of-scope answer	Poetry questions in a math worksheet	Clarify or tighten prompt
Unintended humor or offensiveness	“Write slang for greetings” returns inappropriate terms	Use stricter context and role
Too short/generic	“Math is important for life.”	Request specific examples or deeper analysis
Wrong format	List instead of table	Remind AI of required format

Action Checklist for Every AI Output

Fact-check key details, especially for assessments or public documents.
Read as your learner: Is it clear, age-appropriate, motivating, and accurate?
Check for bias/sensitivity: Any stereotypes, assumptions, or off-color humor?
Review format and clarity: Table/list/dialogue as requested? Well-structured?
Edit or resubmit prompt if unsatisfied, try adjusting specificity, adding examples, or clarifying audience.

Practice & Advanced Scenarios

Next Steps

Improve quality by crafting better prompts, and boost creativity with advanced scenario design.

Remember: You are the expert, and the AI is always your assistant, never the final authority.

Why Evaluate AI Output?​

Common Issues and What to Watch For​

1. Hallucinations (AI “Making Things Up”)​

What does hallucination look like?​

Special Note for Educators: Always Check Links!​

Other Subtle Hallucinations:​

Why does this happen?​

2. Inappropriate or Biased Content​

3. Irrelevant or Off-Topic Answers​

4. Lack of Depth or Detail​

5. Repetitiveness​

6. Missed Format​

Key Principles for Evaluating Output​

Real-World Pitfalls (and How to Fix Them)​

Action Checklist for Every AI Output​

Practice & Advanced Scenarios​

Next Steps​