Yes, all core features are free. A Pro plan is available for advanced users.

What languages does the voice assistant support?

The AI voice assistant supports 6 languages: French, English, German, Spanish, Italian, and Portuguese.

Yes, your data is encrypted and hosted on secure servers in Europe (Paris). RGPD compliant.

Does TAMSIV work offline?

The voice assistant requires an internet connection. Your existing tasks and memos are accessible offline.

Can I share with my family?

Yes! Create a family group and share tasks, memos, and your calendar. Each member sees updates in real-time.

Is TAMSIV available on iPhone?

TAMSIV is currently available on Android. An iOS version is planned — sign up on our website to be notified upon release.

Building a Real-time Voice Pipeline: From Raw Audio to AI Response

Name: TAMSIV
Author: TAMSIV

The heart of TAMSIV is voice. Not a gadget — voice IS the primary interface. Building a real-time voice pipeline solo means entering a world where every millisecond counts.

The Pipeline Architecture

Audio PCM 16kHz mono → WebSocket (JWT) → Deepgram Live STT (VAD) → OpenRouter LLM → Function calling → OpenAI TTS → Audio response

Audio must be PCM 16-bit, 16kHz, mono. The phone sends raw binary chunks via WebSocket.

Authenticated WebSocket

Supabase JWT token in query string: ws://backend:3001?token=eyJhbG.... Validation on connection. If the token expires mid-conversation, automatic reconnection with a fresh token.

Deepgram Live STT and VAD

Deepgram's VAD (Voice Activity Detection) detects when the user has finished speaking. Without VAD, a client-side silence timeout is needed — too short and it cuts off, too long and it lags. Deepgram handles this with precision.

The challenge: managing intermediate results (is_final: false) vs. final results (is_final: true). Final results must be accumulated to build the complete transcription.

LLM Orchestration

The transcription goes to OpenRouter with function calling. The model is configurable with automatic fallback. Total latency: STT finalization (~200ms) + LLM (~800ms-2s) + TTS (~500ms). Between 1.5 and 3 seconds.

OpenAI TTS

nova voice. Audio is streamed back via the same WebSocket. The frontend starts playing as soon as the first chunks arrive, without waiting for the full response.

Lessons from Real-time

Anything can fail at any time. Smart retries, circuit breakers, fallbacks at every step. The pipeline is robust today, but every error handling line represents a bug encountered.