Evaluations

Evaluations let you test how your AI instructor handles real conversations before your video goes live. The system simulates a viewer interacting with your instructor, judges each response as pass or fail, and helps you identify areas for improvement.

Opening the Evaluations Panel

Open a video in the editor
Click the Evaluations tab
You'll see a list of any previous evaluations, or an empty state if this is your first

Creating an Evaluation

Click the New button in the panel header to configure a test run.

Evaluation Name

A label to identify this test. It auto-generates with a timestamp (e.g., "Evaluation Jan 24 3:45 PM"), but you can rename it to something descriptive like "Beginner persona test" or "Off-topic handling check."

Viewer Persona

Describe the type of viewer you want to simulate. This controls how the simulated viewer behaves during the conversation.

The default persona is a student who works through the full course — engaging with lectures, participating in roleplays and interactive exercises, answering quiz questions, and progressing step by step. You can customize this to test specific scenarios:

"A confused student who struggles with technical terms and needs simple explanations"
"An advanced learner who already knows the basics and asks challenging follow-up questions"
"A distracted viewer who frequently goes off-topic"
"A skeptical viewer who questions claims and asks for evidence"

If you've edited the persona and want to go back, click Reset to default next to the label.

Evaluation Criteria

Instructions that define how your instructor's responses should be judged. This is written as a single paragraph and sent as one instruction block to the evaluator.

The default criteria cover relevance to lesson content, handling of off-topic questions, and engagement quality. You can rewrite this to focus on what matters most for your video — for example, "Stay grounded in the lesson, answer clearly, and politely redirect if the viewer asks something unrelated."

If you've edited the criteria and want to go back, click Reset to default next to the label.

Number of Interactions

How many back-and-forth conversation turns to simulate, from 1 to 100. Each interaction is one viewer message and one instructor response.

Repeat Evaluation

Run the same configuration multiple times (1–10 runs). Since AI responses naturally vary, multiple runs help you gauge consistency. Each run creates a separate result you can compare.

Personalization Variables

If your video uses personalization, you can fill in test values:

Viewer Name — the name the instructor will use when addressing the viewer
Personalization Questions — if your video's AI Instructions include personalization questions, they appear here with their original labels so you can fill in test answers

These fields appear automatically based on your video's configuration.

Watching an Evaluation Run

After clicking Start Evaluation, the conversation appears in real time:

Viewer messages (right side) show what the simulated viewer said
Instructor messages (left side) show your AI's responses
A progress bar tracks how many turns have completed

You can continue working in other tabs while the evaluation runs. Come back anytime to check progress.

To stop an evaluation early, click the Cancel button. Partial results are kept.

Understanding Results

At the top of every evaluation conversation, a config banner shows the settings used for that run:

Persona — the viewer persona description
Criteria — the evaluation criteria
Variables — any personalization variables that were set
Turns — the number of interactions configured
Duration — how long the evaluation took to complete

This makes it easy to remember what each evaluation was testing, even weeks later.

Pass / Fail

Each conversation turn is judged against your evaluation criteria and marked as pass or fail. The results appear as badges:

All passed — every scored turn met the criteria
N failed — the number of turns that didn't meet the criteria

The overall result is shown in the sidebar next to each evaluation and in the chat header when viewing results.

Status

Status	Meaning
Pending	Queued and waiting to start
Running	In progress — conversation updating live
Completed	Finished — pass/fail results and summary available
Failed	Something went wrong (check the error message)
Cancelled	Stopped before finishing — partial results available

The evaluations sidebar shows key information at a glance:

Status badge — current state of the evaluation
Pass/fail badge — overall result for completed evaluations
Turn count — how many turns were completed
Timestamp — when the evaluation was created
Personalization variables — the first two variable values, with a "+N more" indicator if there are additional ones

Recommended Workflow

Run an initial evaluation with default settings to establish a baseline
Review the results — look at which turns failed and read the summary
Update your AI Instructions or video content based on findings
Run another evaluation with the same configuration to measure improvement
Repeat until all turns pass consistently

Testing Different Viewer Types

Create separate evaluations with different personas to get a well-rounded picture:

A "confused beginner" persona to test clarity
An "advanced expert" persona to test depth
A "distracted, off-topic" persona to test redirection

Tips

Start with defaults — the default persona and criteria work well for a general quality check
Use descriptive names — name evaluations after what you're testing so you can find them later
Test after major changes — run a fresh evaluation whenever you update AI Instructions or video content
Compare multiple runs — use 2–3 repeat runs to separate consistent issues from one-off variations
Keep criteria focused — a clear, concise paragraph gives better results than a long list of vague rules