← wrbell.eng
~/wrbell.eng/stark-translate.case.md
— · — —:—
$cat ./projects/stark-translate/INTENT.md

stark-translate

Deployed weekly

Live bilingual EN/ES speech-to-text, built so a Spanish-speaking congregation could follow along with an English sermon in real time — on a laptop, no cloud.

Role
Builder
Stack
Python
Latency
~1.2 s
Status
Deployed
§01 · problem path
./problem.md why does this exist? who benefits? what would happen if it didn't?

congregation_split_by_language

A local church holds English services with a growing number of Spanish-speaking attendees. Hiring a human interpreter every week wasn't practical, and commercial live-translation tools either required constant cloud connectivity, a monthly SaaS bill, or both — plus handed off sensitive pastoral audio to a third party.

What they needed: real-time EN → ES captions displayed at the back of the sanctuary, running on whatever laptop was already at the sound desk, with no outbound traffic after the weights are downloaded.

§02 · architecture path
./pipeline.py single process, 4 stages, in-memory queues, ws fan-out.

four_stages_one_pipe

The system is a single Python process with four stages wired together by in-memory queues, then fanned out to the display over WebSocket so the projector laptop can be anywhere on the LAN.

┌─────────────┐    ┌─────────────────┐    ┌─────────────────────┐    ┌────────────┐
  mic input  ───▶   whisper asr   ───▶     marian-mt       ───▶  websocket 
  audio · vad       en text · MLX        en→es · gemma fb         display   
└─────────────┘    └─────────────────┘    └─────────────────────┘    └────────────┘
Python 3.11host runtime · single processv.dep
WhisperASR · whisper-small · on-devicev.dep
MarianMTNMT · en-es · ~80 MBv.dep
TranslateGemmaNMT fallback · prose-quality, heavierv.dep
WebSocketfan-out · LAN-only display pathv.dep
MLXApple silicon backendv.dep
sounddevicecapture · CoreAudio bindingsv.dep
§03 · decisions path
./CHANGELOG.md picked → rejected. each line is a deliberate trade.

tradeoffs_actually_made

  1. a3f1c92 · +34 −12

    On-device, not cloud.

    Whisper-small runs at >1× realtime on an M-series Mac. Cloud ASR would have been faster and more accurate, but no internet was a hard requirement. Accuracy tuning happens via domain-specific prompts, not model size.

  2. b1ee047 · +78 −03

    Chunked streaming, not true continuous.

    Audio is chunked on ~2 s voice-activity boundaries and transcribed one chunk at a time. This costs end-to-end latency (~1.2 s median) but keeps the implementation simple and resilient to dropped chunks.

  3. 4d8c3a1 · +22 −41

    MarianMT for EN→ES, with TranslateGemma as a quality fallback.

    MarianMT's EN-ES pair is tiny and fast; TranslateGemma produces better prose but is heavier. A runtime flag swaps between them depending on host hardware.

  4. 9f02e44 · +12 −08

    WebSocket display, not a rendered overlay.

    The projector laptop just opens a URL. No screen-sharing, no OBS scene — any browser on the LAN becomes a caption screen. Trivial to restart, trivial to fail over.

§04 · outcome path
./STATE.md running. weekly. quiet sundays.

what_it_does_now

Runs weekly at services on a single MacBook. Spanish-speaking attendees can follow the sermon in real time without needing a human interpreter or a cloud connection. Median latency is around a second; worst-case spikes are bounded by the chunk size.

Next up: speaker adaptation (prime Whisper with the pastor's voice), glossary pinning for religious terminology, and a second display path for individual phone screens over the same WebSocket.