Text-to-Speech
POST /dashboard/user/openai/generate-speechAzure Neural · OpenAI TTS · ElevenLabs · Speechify
Generate studio-quality voices through a multi-engine router (Azure Neural · OpenAI TTS · ElevenLabs · Speechify), clone a speaker in under two seconds, isolate dialogue from any audio source, and answer conversational voice sessions in voice mode. One surface, four production-grade workloads.
200+ voices · <2s clone · Real-time isolation
Cycling 5 voices · auto-rotating every 5s
Backend surface
Text-to-Speech, voice-cloning, audio-isolation, and conversational voice mode each map to a first-party backend extension. Endpoints below are the live anchors, not mocks.
POST /dashboard/user/openai/generate-speechAzure Neural · OpenAI TTS · ElevenLabs · Speechify
POST /dashboard/admin/voice-chatbot/train/fileElevenLabs instant-clone API
GET /dashboard/user/openai/generator/ai_voice_isolatorOpenAIGenerator slug pipeline (no dedicated controller)
POST /api/v2/chatbot-voice/{uuid}/store-conversationElevenLabs Conversational AI
Provider coverage
Pick the engine that fits the voice. The synthesis surface routes between four TTS providers; voice mode runs on ElevenLabs Conversational AI.
Browse the full LLM and TTS catalog at /models.
Capability matrix
Audio isolation skips multi-language (it works on any locale). Voice-cloning only lights up on the cloning + conversational surfaces. Everything else lights up across all four.
| Capability | Text-to-Speech | Voice Cloning | Audio Isolation | Voice Mode |
|---|---|---|---|---|
| Realtime | ✓ | ✓ | ✓ | ✓ |
| Batch processing | ✓ | ✓ | ✓ | — |
| 60+ languages | ✓ | ✓ | — | ✓ |
| Voice cloning | — | ✓ | — | ✓ |
| Emotion + prosody | ✓ | ✓ | — | ✓ |
| REST API | ✓ | ✓ | ✓ | ✓ |
Voice library
The full library spans 200+ neural voices across 60+ locales. Each chip below represents a category (locale × tone × gender) that the catalog supports across multiple engines (no accent-transfer fabrication, the upstream voice is the upstream voice).
Pricing
Business+ adds voice cloning quota and conversational voice minutes. See /pricing for the breakdown.
Pair voice with
67-model chat playground: drive scripts and dialog into voice synthesis
Pair voice mode with the same chatbot definition across 8 channels
Use cases
Generate consistent multi-hour narration with a single neural voice. Adjust prosody per chapter, batch-render to MP3 + chapter metadata.
FAQ
200+ neural voices spanning 60+ locales. The synthesis surface routes between four engines (Azure Neural, OpenAI TTS, ElevenLabs, and Speechify), picked per voice. Locale list is browsable inside the Voice Studio.
200+ voices, 60+ languages, ElevenLabs cloning, and voice mode for inbound calls. Start free.