Ed-TechContent IntelligenceCASE_07

Auto-generating 10,000 practice questions from curriculum documents at exam-quality standard

A K-12 exam-prep company (engagement late 2023, ~$800k/year prior spend on manual question authoring) had tried wrapping GPT-4 directly. Subject matter experts were rejecting ~60% of AI-generated questions as "too easy" or "ambiguous", they tested recall, not understanding. Our pipeline forces Bloom's-taxonomy-aware generation (curriculum parsing → cognitive-level-tagged prompt construction → automated quality filter trained on the client's own accepted-vs-rejected examples). Expert acceptance reached 91% by the end of Concept Validation; cost per accepted question −74% vs. manual. The pipeline is now used as a first-draft tool, experts refine the top 9% rather than writing from scratch.

10K

Questions/month generated

91%

Accepted by subject experts

−74%

Cost vs. manual authoring

The Challenge

The client had tried GPT-4 directly. The questions were grammatically correct but educationally shallow, they tested recall, not understanding. Subject matter experts were rejecting 60% of AI-generated questions as "too easy" or "ambiguous." The core problem: LLMs without pedagogical structure produce surface-level questions. The model needed to understand Bloom's taxonomy and generate questions at the right cognitive level for each curriculum objective.

Our Approach

Discovery & Blueprint produced a generation pipeline with three stages: (1) curriculum parsing to extract learning objectives and Bloom's level; (2) structured prompt construction that forces the LLM to generate at a specified cognitive level; (3) an automated quality filter trained on the client's own question bank (accepted vs. rejected examples). Concept Validation ran on 3 subject areas. Expert acceptance rate went from 40% to 91% by the end of validation.

Outcome

10,000 questions generated per month at a 91% expert acceptance rate. Cost per accepted question reduced 74% vs. manual authoring. The pipeline is now used as a first-draft tool, subject matter experts refine the top 9% of rejections rather than writing from scratch.

What We Learned

LLMs need pedagogical scaffolding, not just a prompt, Bloom's taxonomy is the structure.

Training the quality filter on your own acceptance data is more effective than manual rubrics.

The right goal is "expert-in-the-loop," not "expert replaced."

Stages Engaged

Feasibility Call

Discovery & Blueprint

Concept Validation

Total Duration

3 months total

Artifacts Delivered

PRD

Generation Pipeline Spec

Bloom's Taxonomy Integration Guide

WBS

Quality Filter Training Dataset

Start with a Feasibility Call

2 hours. No cost. We'll tell you honestly whether AI makes sense for your case.

Book a call