The problem with writing quiz questions from documents manually
A typical compliance training document - data handling policy, safety procedures, product knowledge manual - runs 20-60 pages. Writing 15 quality questions from that document manually takes a skilled instructional designer 45-60 minutes. That includes reading time, draft questions, answer choice generation, and choosing the correct answer.
At that rate, a 10-module training program represents 7-10 hours of question-writing work before you have touched curriculum design, facilitation planning, or anything else. Most trainers do this work while doing everything else, which means questions get rushed and often test surface-level recall rather than actual understanding.
AI extraction does not replace that judgment - it gives you a draft to react to rather than a blank page to fill. The editing is faster than the creating.
How AI document extraction works
When you upload a PDF to Sheelon, the system does three things:
- Extracts the text from the document (including text inside tables and multi-column layouts, though image-based PDFs need OCR first - more on that below).
- Passes the text to a large language model with a prompt that asks it to identify key facts, definitions, procedures, and rules - the content most likely to appear in a knowledge check.
- Generates multiple-choice questions with four answer choices each, marking one as correct and providing a brief explanation for why.
The whole process takes 60-120 seconds depending on document length. The output is a fully-formatted quiz ready to launch, but you should always review it before using it with real participants.
Step-by-step walkthrough
Step 1: Prepare your PDF
Not all PDFs extract equally well. Text-based PDFs - created from Word documents, Google Docs, or InDesign with real text layers - extract accurately. Scanned documents (a physical manual that was photocopied and PDF'd) are images of text, which requires OCR to read.
Before uploading, check that your PDF is text-based: open it, try to select and copy a sentence. If the text highlights and copies normally, you are good. If nothing selects, it is a scanned image and you will need to run it through an OCR tool first. Adobe Acrobat and Google Drive (upload, right-click "Open with Google Docs") both do adequate OCR for free.
For best extraction quality: trim the document to the relevant sections before uploading. A 60-page document where only 20 pages contain testable content will produce more noise in the questions than a clean 20-page upload. This is one area where your judgment as a trainer still matters.
Step 2: Upload and extract
In Sheelon, create a new quiz and select "Generate from document." You will see a file upload area that accepts PDF, DOCX, and PPTX. Drag in your file.
Before clicking Generate, you have a few options:
- Number of questions: Default is 10. For a 20-30 page document, 12-15 is usually more appropriate. For a short policy document (5-10 pages), keep it at 8-10.
- Question type: Default is multiple choice. If you are on the Pro plan, you can also request fill-in-the-blank or true/false questions to be included in the mix.
- Difficulty: If you leave this blank, the AI will pick a mix. Setting it to "intermediate" tends to produce more useful questions than "beginner" (which often generates questions with obvious answers) or "advanced" (which can produce questions with ambiguous correct answers).
Step 3: Review and edit outputs
This is the step most people want to skip. Do not skip it.
AI-generated questions from documents have predictable failure modes. The most common: the model picks a trivially specific detail from the document (a date, a specific dollar threshold, a minor exception to a rule) that is technically in the source but that participants could not reasonably be expected to know. These questions feel like gotchas, not knowledge checks.
The second common failure: questions where two of the four answer choices are plausible enough that experienced participants will argue about them. The model generates distractor answers quickly but does not always test whether they are actually wrong in context.
A good editing pass for 15 AI-generated questions takes 10-15 minutes, not 45. Read each question and ask: could a competent person who completed this training reasonably be expected to answer this correctly? If yes, keep it. If the correct answer is arbitrary or the distractors are too similar, rewrite or delete.
Step 4: Launch the session
Once you are satisfied with the questions, click Launch Game. Sheelon generates a 6-digit PIN. Share the URL (sheelon.me/join) and the PIN with participants verbally, in chat, or via a QR code that appears on your screen.
Participants join on any device - phone, tablet, laptop - without creating an account. You control the pace from the host dashboard, advancing questions manually or setting a timer. After the last question, you see a leaderboard and a results breakdown by question.
What AI extraction handles well
Based on common use cases, AI extraction performs well on:
- Factual policy information - rules, thresholds, deadlines, definitions. "The data retention period for customer records is X years" becomes a strong multiple-choice question.
- Procedures with distinct steps - the AI identifies sequential steps well and generates questions like "which step comes after X?" or "what is required before proceeding to Y?"
- Lists and classifications - if the document contains categories, types, or classifications, the AI tends to produce clean questions from them.
- Product knowledge with specific facts - product specs, feature names, compatibility requirements.
Where you still need to edit
AI extraction is weakest on:
- Judgment-based application - "what should you do if a customer asks for X?" questions require scenario design that AI rarely gets right without significant prompting.
- Context-dependent exceptions - documents with "in most cases, but in situation Y, rule Z applies" structure often produce questions that ignore the exception or test the exception when the general rule matters more.
- Heavily visual documents - if your training material relies on diagrams, charts, or visual workflows, the extracted text will be missing most of the information. You will need to add questions manually for that content.
Document types and what to expect
Here is a realistic quality expectation by document type:
- HR policy documents - high quality extraction. Clear, factual prose, defined terms, specific rules. Usually needs minimal editing.
- Technical product manuals - good quality extraction, though question difficulty can be calibrated toward the trivially specific. Plan to delete 2-3 questions that test obscure specs.
- Compliance and regulatory documents - good on definitions and requirements, weak on exceptions and conditions. Review carefully.
- Sales training decks (PPTX) - moderate quality. Bullet points produce shorter, shallower questions. Narrative slide notes extract better than headline bullets.
- Case studies and scenario documents - weaker extraction. The AI pulls factual details from the narrative rather than the learning objectives. Expect more editing time.
Bottom line
PDF-to-quiz AI extraction is genuinely useful for trainers who work with document-heavy content. It is not a one-click solution - it requires an editing pass from someone who understands what the participants need to learn. But it changes the math on quiz creation from "start with nothing" to "start with a solid draft."
For a typical 20-30 page compliance or product knowledge document, a realistic workflow is: upload (2 minutes), extraction (90 seconds), review and edit (10-15 minutes), launch (2 minutes). That is a 15-20 minute total workflow compared to 45-60 minutes starting from scratch.
Sheelon's document extraction is available on the Pro plan ($30/year or $3.99/month). The free plan includes AI generation from a topic description, which is a good starting point if you do not yet have a document to extract from. See the guide on AI quiz generation for instructional designers for a deeper look at prompt strategies for generating from a topic rather than a document.