Blog
AI/Voice
January 8, 20267 min

The Dictaphone: When Voice Becomes the Primary Interface

TAMSIV's promise: create a task by speaking, faster than typing. The entire Dictaphone UX stems from this promise.

Push-to-talk

I chose push-to-talk over continuous listening. Three reasons: battery, privacy, and ambient noise. Push-to-talk is explicit — no ambiguity. Deepgram's VAD automatically handles end-of-sentence detection.

The PendingCreation Pattern

TAMSIV's most important pattern. The AI analyzes the dictation and creates a preview. Nothing is saved to the database. The user sees the suggestion, modifies if needed, then confirms or cancels.

Why? Speech recognition isn't perfect. The AI might misinterpret. The user must remain in control. Voice speeds up input, but the human decides.

Native vs. Cloud STT

Two configurable modes: native (free, local, variable quality) and Deepgram cloud (consistent, accurate, paid). Native STT for the Free plan, Deepgram for Pro/Team.

The Button's UX

The Dictaphone button is the app's first tab. Not tasks, not memos. The microphone. Because in TAMSIV, voice isn't a feature — it IS the product. Haptic feedback with each state change. The user physically feels when the app is listening.