Article URL: GitHub - mahimairaja/voiceai: Set of π with π to help those building Voice AI agents ποΈπ€ Β· GitHub Comments URL: https://news.ycombinator.com/item.id=47991018 Points: 41 # Comments: 3.
What is the best local AI model for voice?
For local-first speech-to-text, I still default to Whisper or faster-whisper, especially on an 8GB GPU.
Whisper/faster-whisper feels like the βboring but reliableβ baseline right now, especially if youβre trying to keep everything local and predictable. The only thing Iβd flag for beginners is expectations-setting: once you move from clean mic audio to real rooms and multiple speakers, the βworks on my machineβ demo advantage disappears fast.
Can you give me a list of alternatives to whisper and a link to go learn more about each?
Before you pick three Whisper alternatives, check licensing and GPU needs, or youβll inherit two extra ops stacks.
If everyone defaults to Whisper, pricing and rate limits creep in fast β check out Riva, Deepgram, Vosk, and AssemblyAI docs.