How It Works
One recording in. Five study formats out. Nothing leaves the building.
The assistant is a small web application. An educator logs in, uploads a lecture — an audio recording, a PDF, or plain text — and the system runs it through a fixed pipeline, posting progress in real time. The heavy lifting is done by two local engines: whisper.cpp for speech-to-text and a local Ollama model for the language tasks.
The pipeline
Five formats from one lecture
The architecture
The whole system is a single Perl/Mojolicious application. A browser talks to it over a local port; behind that sit a job queue, the transcription engine, and the language model.
Browser → Mojolicious web app
├─ SQLite jobs + users + results
├─ whisper.cpp audio → transcript
├─ Ollama transcript → summary / notes / quiz
└─ pdftotext PDF → text
The job queue
Processing a lecture takes minutes, not milliseconds, so every upload becomes a queued job that moves through clear states: queued → transcribing → summarizing → quizzing → completed (or failed, with the reason recorded). The work runs in a background subprocess so the web interface never blocks, and the page streams live status updates as each stage finishes.
The stack, deliberately boring
The implementation favours components that are easy to run and audit on University infrastructure:
- Perl + Mojolicious — the web app, job queue, and pipeline
- whisper.cpp — on-device speech-to-text, no cloud transcription service
- Ollama — a quantized instruction-tuned model serving summaries, notes, objectives, and quizzes locally
- SQLite — one file for jobs, users, and results
- pdftotext — text extraction for slide and document uploads