How It Works

One recording in. Five study formats out. Nothing leaves the building.

The assistant is a small web application. An educator logs in, uploads a lecture — an audio recording, a PDF, or plain text — and the system runs it through a fixed pipeline, posting progress in real time. The heavy lifting is done by two local engines: whisper.cpp for speech-to-text and a local Ollama model for the language tasks.

The pipeline

🎤
Recording
Audio, PDF, or text uploaded by the educator
💬
Transcribe
whisper.cpp turns speech into a high-fidelity transcript (pdftotext for PDFs)
🧠
Generate
A local Ollama model drafts each study format from the transcript
Review
Educator checks for clinical accuracy before students see it

Five formats from one lecture

Transcript
A full, searchable record of what was actually said.
Summary
The concept of the lecture, distilled.
Study notes
Structured notes for review and recall.
Learning objectives
What a resident should be able to do afterward.
Practice quiz
Multiple-choice questions to self-test understanding.

The architecture

The whole system is a single Perl/Mojolicious application. A browser talks to it over a local port; behind that sit a job queue, the transcription engine, and the language model.

Browser  →  Mojolicious web app
                ├─ SQLite        jobs + users + results
                ├─ whisper.cpp   audio  → transcript
                ├─ Ollama        transcript → summary / notes / quiz
                └─ pdftotext     PDF    → text

The job queue

Processing a lecture takes minutes, not milliseconds, so every upload becomes a queued job that moves through clear states: queued → transcribing → summarizing → quizzing → completed (or failed, with the reason recorded). The work runs in a background subprocess so the web interface never blocks, and the page streams live status updates as each stage finishes.

The stack, deliberately boring

The implementation favours components that are easy to run and audit on University infrastructure:

Privacy by construction: there is no external API call in the pipeline. The recording, the transcript, and every generated artifact stay on University of Toronto infrastructure from upload to review.

Why this design — the UDL lens →