Voice Diary | Ivan Dankov

The Problem

I wanted a voice diary — talk into my phone throughout the day, and wake up to a written diary entry the next morning. No typing, no manual transcription, no third-party service holding my recordings.

Nothing on the market does this without shipping your audio to someone else’s cloud. So I built it.

How It Works

Voice Diary is a Progressive Web App running on my home server. Add it to your home screen and it works like a native app — tap, record, done.

Record — tap the mic, talk, tap again
Store — audio saved to PostgreSQL + filesystem
Transcribe — each recording converted to text
Compile — Claude writes a diary entry from the day’s transcripts
Read — open the app next morning, entry is waiting

Transcription supports two backends, switchable with one environment variable. faster-whisper runs entirely on CPU with no network calls — about a second per recording. AWS Transcribe streams audio over HTTP/2 for higher accuracy. Either way, the text goes to Claude via AWS Bedrock, which writes a cohesive diary entry that preserves your voice and weaves multiple recordings into a narrative.

Features

One-tap recording — PWA with MediaRecorder API, works offline
Self-hosted — all data stays on your hardware or in your bedrock account
Dual transcription — local Whisper or AWS Transcribe
AI compilation — Claude writes diary entries from transcripts
Scheduled — auto-compiles at 3 AM daily
Manual compile + recompile — on-demand generation, add recordings and recompile
HTTPS — mkcert certificates for secure mic access
Dark mobile UI — designed for phone-first usage

Architecture

Two Docker containers: the FastAPI app and PostgreSQL.

FastAPI — API key auth, recording management, compilation pipeline
PostgreSQL — recordings metadata + diary entries
Filesystem — audio files with magic-byte format detection
APScheduler — daily compilation trigger
Transcription — faster-whisper (local) or AWS Transcribe (streaming)
AWS Bedrock — Claude for diary compilation from text transcripts

Tech Stack

Backend: Python 3.12, FastAPI, SQLAlchemy, APScheduler
Frontend: Vanilla JS PWA, Tailwind CSS, MediaRecorder API
Database: PostgreSQL 17
Transcription: faster-whisper (int8 quantised, CPU) or AWS Transcribe Streaming
AI: Claude via AWS Bedrock
Infrastructure: Docker Compose, mkcert HTTPS, ZeroTier

Challenges

Chrome lies about audio format. The MediaRecorder API claims to record OGG/Opus, but Chrome actually produces WebM containers. The app was saving files as .ogg and serving them with the wrong content type — browsers saw the mismatch and refused to play. Fixed with magic byte detection: read the first 4 bytes on upload and streaming to determine the actual format.

HTML audio elements can’t send headers. The <audio> tag makes GET requests with no way to attach auth headers. Every playback attempt returned 401. Fixed by accepting the API key as a query parameter on audio routes.

Mic access requires HTTPS. Browsers block getUserMedia() on non-HTTPS origins. Since the app runs on a private network IP, not localhost, real HTTPS was required — solved with mkcert generating locally-trusted certificates.

TLS everywhere. Originally the app handled HTTPS itself via uvicorn’s --ssl-keyfile flag — cert paths baked into the app container, healthchecks needing ssl._create_unverified_context() hacks. Moved to an Nginx TLS sidecar: app runs plain HTTP internally, Nginx terminates TLS on the exposed port. Cleaner separation, no SSL hacks in healthchecks, and the same pattern across every service.

Status

Live and recording daily. Built with The Forge framework conventions.

What’s Next

Bedrock data automation - This accepts audio - can go straight there and bypass weaker whisper model and AWS Transcribe