What voice cloning provider do you use?

ElevenLabs instant-clone API. Upload a short voice sample (file, URL, or text-reference variants at /dashboard/admin/voice-chatbot/train/file) and synthesis is ready in under two seconds.

Can I separate vocals from background music?

Yes. AIVoiceIsolator handles source separation: vocals, music stem, ambient noise. Output is studio-quality dialogue ready for re-dub or podcast post.

ChatbotVoice answers conversational voice sessions on the web widget, embed iframe, or any SIP / phone-number provider you connect. Caller speaks → STT → LLM → TTS → caller hears the reply. Conversational, no DTMF menus. Powered by ElevenLabs Conversational AI; recordings + transcripts archived in your dashboard.

Can I use cloned voices commercially?

You retain commercial rights to output for voices you have consent to clone (your own, your client's, or a voice you have written permission for). Vendor Terms of Service for the upstream provider (e.g. ElevenLabs) govern the cloned voice asset itself. Review them before deploying a cloned voice in a regulated context.

Do you support emotion or prosody control?

Yes: pitch, speed, and emotion markers via SSML on Azure Neural. ElevenLabs supports emotion control through prompt cues. Conversational voice mode tunes emotion per turn.

Yes: POST /dashboard/user/openai/generate-speech for TTS (multi-engine router), POST /api/v2/chatbot-voice/{uuid}/store-conversation for voice mode. Server-to-server, API-key signed.

What audio formats can I export?

MP3, WAV, Opus. Metadata embed (locale, timestamp, source). Download from the studio or stream via API.

Voice Studio

200+ voices. 60+ languages. From cloning to real-time isolation.

Generate studio-quality voices through a multi-engine router (Azure Neural · OpenAI TTS · ElevenLabs · Speechify), clone a speaker in under two seconds, isolate dialogue from any audio source, and answer conversational voice sessions in voice mode. One surface, four production-grade workloads.

200+ neural voices across 60+ locales (Azure · OpenAI · ElevenLabs · Speechify)
ElevenLabs voice cloning in under two seconds
Strip background noise from any audio source
Voice mode answers conversational voice sessions

Start Free View Pricing

200+ voices · <2s clone · Real-time isolation

Waveform

Voice clone preview

🇺🇸

en-US

Conversational · Female

Cycling 5 voices · auto-rotating every 5s

200+ voices, 60+ languagesVoice cloning in under two secondsReal-time audio isolationVoice mode for conversational calls

200+

Neural voices

Multi-engine catalog (Azure · OpenAI · ElevenLabs · Speechify)

60+

Languages

Locale-aware prosody and accent

<2s

Voice clone

ElevenLabs instant-clone API from a short sample

Real-time

Audio isolation

Strip noise while keeping dialogue

Backend surface

Four surfaces. One studio.

Text-to-Speech, voice-cloning, audio-isolation, and conversational voice mode each map to a first-party backend extension. Endpoints below are the live anchors, not mocks.

TTSController (multi-engine)

Text-to-Speech

POST /dashboard/user/openai/generate-speech

Azure Neural · OpenAI TTS · ElevenLabs · Speechify

ElevenLabsVoiceChat

Voice Cloning

POST /dashboard/admin/voice-chatbot/train/file

ElevenLabs instant-clone API

AIVoiceIsolator

Audio Isolation

GET /dashboard/user/openai/generator/ai_voice_isolator

OpenAIGenerator slug pipeline (no dedicated controller)

ChatbotVoice

Voice Mode

POST /api/v2/chatbot-voice/{uuid}/store-conversation

ElevenLabs Conversational AI

Provider coverage

Five engines. One router.

Pick the engine that fits the voice. The synthesis surface routes between four TTS providers; voice mode runs on ElevenLabs Conversational AI.

Azure Neural TTSsynthesisOpenAI TTSsynthesisElevenLabscloningSpeechifysynthesisElevenLabs Conversational AIconversational

Browse the full LLM and TTS catalog at /models.

Capability matrix

Six capabilities. Honest gaps.

Audio isolation skips multi-language (it works on any locale). Voice-cloning only lights up on the cloning + conversational surfaces. Everything else lights up across all four.

Capability	Text-to-Speech	Voice Cloning	Audio Isolation	Voice Mode
Realtime	✓	✓	✓	✓
Batch processing	✓	✓	✓	—
60+ languages	✓	✓	—	✓
Voice cloning	—	✓	—	✓
Emotion + prosody	✓	✓	—	✓
REST API	✓	✓	✓	✓

Voice library

A sample of 12 voice categories.

The full library spans 200+ neural voices across 60+ locales. Each chip below represents a category (locale × tone × gender) that the catalog supports across multiple engines (no accent-transfer fabrication, the upstream voice is the upstream voice).

en-US

Conversational · Female

en-GB

Authoritative · Male

en-AU

Friendly · Female

zh-CN

Warm · Female

ja-JP

Documentary · Male

en-IN

Narrative · Female

es-MX

Energetic · Male

es-ES

Lyrical · Female

fr-FR

Editorial · Female

de-DE

News · Male

tr-TR

Storyteller · Male

ru-RU

Cinematic · Female

Browse the full voice library in the studio →

Pricing

One subscription. Every voice surface.

Lite

Try voice synthesis

Pro

Production voiceover + cloning

Business

Voice mode + isolation

Custom

Volume + dedicated provider keys

Business+ adds voice cloning quota and conversational voice minutes. See /pricing for the breakdown.

Pair voice with

Build the full conversation layer.

AI Chat Studio

67-model chat playground: drive scripts and dialog into voice synthesis

Test 67 models on the same prompt
Promote winner straight into a voice script

Explore AI Chat Studio

Chatbots

Pair voice mode with the same chatbot definition across 8 channels

Single chatbot definition routes to phone + chat
Voice mode answers calls with the same brain

Explore Chatbots

Use cases

Five workloads voiceover ships.

Generate consistent multi-hour narration with a single neural voice. Adjust prosody per chapter, batch-render to MP3 + chapter metadata.

FAQ

Voiceover: the basics.

200+ neural voices spanning 60+ locales. The synthesis surface routes between four engines (Azure Neural, OpenAI TTS, ElevenLabs, and Speechify), picked per voice. Locale list is browsable inside the Voice Studio.

Build your voice surface.

200+ voices, 60+ languages, ElevenLabs cloning, and voice mode for inbound calls. Start free.

Start free See pricing