Voice Studio

200+ voices. 60+ languages. From cloning to real-time isolation.

Generate studio-quality voices through a multi-engine router (Azure Neural · OpenAI TTS · ElevenLabs · Speechify), clone a speaker in under two seconds, isolate dialogue from any audio source, and answer conversational voice sessions in voice mode. One surface, four production-grade workloads.

  • 200+ neural voices across 60+ locales (Azure · OpenAI · ElevenLabs · Speechify)
  • ElevenLabs voice cloning in under two seconds
  • Strip background noise from any audio source
  • Voice mode answers conversational voice sessions

200+ voices · <2s clone · Real-time isolation

Waveform
Voice clone preview
🇺🇸
en-US
Conversational · Female

Cycling 5 voices · auto-rotating every 5s

200+ voices, 60+ languagesVoice cloning in under two secondsReal-time audio isolationVoice mode for conversational calls
200+
Neural voices
Multi-engine catalog (Azure · OpenAI · ElevenLabs · Speechify)
60+
Languages
Locale-aware prosody and accent
<2s
Voice clone
ElevenLabs instant-clone API from a short sample
Real-time
Audio isolation
Strip noise while keeping dialogue

Backend surface

Four surfaces. One studio.

Text-to-Speech, voice-cloning, audio-isolation, and conversational voice mode each map to a first-party backend extension. Endpoints below are the live anchors, not mocks.

TTSController (multi-engine)

Text-to-Speech

POST /dashboard/user/openai/generate-speech

Azure Neural · OpenAI TTS · ElevenLabs · Speechify

ElevenLabsVoiceChat

Voice Cloning

POST /dashboard/admin/voice-chatbot/train/file

ElevenLabs instant-clone API

AIVoiceIsolator

Audio Isolation

GET /dashboard/user/openai/generator/ai_voice_isolator

OpenAIGenerator slug pipeline (no dedicated controller)

ChatbotVoice

Voice Mode

POST /api/v2/chatbot-voice/{uuid}/store-conversation

ElevenLabs Conversational AI

Provider coverage

Five engines. One router.

Pick the engine that fits the voice. The synthesis surface routes between four TTS providers; voice mode runs on ElevenLabs Conversational AI.

Azure Neural TTSsynthesisOpenAI TTSsynthesisElevenLabscloningSpeechifysynthesisElevenLabs Conversational AIconversational

Browse the full LLM and TTS catalog at /models.

Capability matrix

Six capabilities. Honest gaps.

Audio isolation skips multi-language (it works on any locale). Voice-cloning only lights up on the cloning + conversational surfaces. Everything else lights up across all four.

CapabilityText-to-SpeechVoice CloningAudio IsolationVoice Mode
Realtime
Batch processing
60+ languages
Voice cloning
Emotion + prosody
REST API

Voice library

A sample of 12 voice categories.

The full library spans 200+ neural voices across 60+ locales. Each chip below represents a category (locale × tone × gender) that the catalog supports across multiple engines (no accent-transfer fabrication, the upstream voice is the upstream voice).

en-US
Conversational · Female
en-GB
Authoritative · Male
en-AU
Friendly · Female
zh-CN
Warm · Female
ja-JP
Documentary · Male
en-IN
Narrative · Female
es-MX
Energetic · Male
es-ES
Lyrical · Female
fr-FR
Editorial · Female
de-DE
News · Male
tr-TR
Storyteller · Male
ru-RU
Cinematic · Female

Browse the full voice library in the studio →

Pricing

One subscription. Every voice surface.

Lite
Try voice synthesis
Pro
Production voiceover + cloning
Business
Voice mode + isolation
Custom
Volume + dedicated provider keys

Business+ adds voice cloning quota and conversational voice minutes. See /pricing for the breakdown.

Pair voice with

Build the full conversation layer.

AI Chat Studio

67-model chat playground: drive scripts and dialog into voice synthesis

  • Test 67 models on the same prompt
  • Promote winner straight into a voice script
Explore AI Chat Studio

Chatbots

Pair voice mode with the same chatbot definition across 8 channels

  • Single chatbot definition routes to phone + chat
  • Voice mode answers calls with the same brain
Explore Chatbots

Use cases

Five workloads voiceover ships.

Generate consistent multi-hour narration with a single neural voice. Adjust prosody per chapter, batch-render to MP3 + chapter metadata.

FAQ

Voiceover: the basics.

200+ neural voices spanning 60+ locales. The synthesis surface routes between four engines (Azure Neural, OpenAI TTS, ElevenLabs, and Speechify), picked per voice. Locale list is browsable inside the Voice Studio.

Build your voice surface.

200+ voices, 60+ languages, ElevenLabs cloning, and voice mode for inbound calls. Start free.