Can I use my own voice clone?

Voice clones are managed in AresGen Voiceover and selectable from the ElevenLabs voice library when you configure the agent. Clone provisioning lives in /tools/voiceover.

How do I embed it on my site?

Drop the public iframe (GET /chatbot-voice/{uuid}/frame) into any page with one tag, with no auth required. For deeper integrations, use the REST API.

What's the difference vs /tools/voiceover (TTS-only)?

Voiceover is one-shot text-to-speech for narration, dubbing, and audio assets. Realtime is a two-way conversation loop with listening, reasoning, and speaking, anchored to an agent persona and knowledge base.

Does it support knowledge bases / RAG?

Yes. The Knowledge Training surface ingests files, raw text, or URLs, embeds them, and retrieves matching context at conversation time. Supported across the agent and embed surfaces.

Yes. GET /api/v2/chatbot-voice/{uuid} exposes the public agent for headless integrations; POST .../store-conversation logs each session for downstream analytics.

What languages are supported?

Languages and voices vary by locale availability in the ElevenLabs catalog. Pick from the provider library at agent-create time and switch voices per persona as needed.

Why one provider instead of orchestrating multiple?

Latency is a system property, not a feature. Orchestrating multiple vendors adds round-trip overhead AresGen avoids by anchoring on one platform that handles STT, LLM, and TTS in a single conversation loop. Fewer hops, fewer SLAs to reconcile, predictable turn-taking.

Realtime Voice Studio

Real-time voice agents. One provider. No orchestration tax.

AresGen's voice agents run on ElevenLabs Conversational AI, a single platform that handles speech-to-text, reasoning, and speech back. Embed via iframe, drive via REST API, train with your knowledge base.

Spin up a voice agent Read embed docs

1 provider · 8 agent personas · 3 RAG input types · iframe + REST embed

Single provider: no orchestration tax, no vendor-mix latency variance
Knowledge base via file, text, or URL, embedded at agent-create time
Iframe widget that drops into any site without auth
REST API at /api/v2/chatbot-voice/{uuid} for headless integrations

Hi, this is your AresGen voice agent. How can I help you today?

Can you walk me through embedding the widget on my marketing site?

Of course. Drop the public iframe tag into any page, no auth required. Want me to share the snippet?

Support Tier-1Sales DiscoveryOnboarding GuideProduct Tour

Single platformKnowledge-base RAGEmbed anywhereREST API

By the numbers

One provider. Eight personas. Three RAG inputs. Four backend surfaces.

Provider: 1; ElevenLabs Conversational AI: STT+LLM+TTS unified
Personas: 8; Curated demo system-prompt presets
RAG inputs: 3; file · text · URL → embedded
Backend surfaces: 4; Agent CRUD · Training · Embed · REST

Backend surfaces

Four endpoints. One conversational agent surface.

Voice Agent CRUD on ElevenLabs Conversational AI, Knowledge Training with rag-training, an iframe-embed widget, and a public REST API. Every surface is wired to the AresGen backend with first-party endpoints.

Voice Agent

ChatbotVoice (core)

POST /dashboard/chatbot-voice/store

ElevenLabs Conversational AI agent — single-platform STT+LLM+TTS

Knowledge Training

ChatbotVoiceTrainController

POST /dashboard/chatbot-voice/train/{file,text,url}

Embedding-based RAG — file ⨯ text ⨯ URL → vector retrieval

Embed Widget

ChatbotVoiceController::frame

GET /chatbot-voice/{uuid}/frame

Public iframe, no auth — drop into any site with one tag

REST API

ChatbotVoiceEmbbedController + History

GET /api/v2/chatbot-voice/{uuid}

Public conversation logging — `api/v2` for headless integrations

One platform

Why one platform?

AresGen real-time voice runs on a single-vendor stack: ElevenLabs Conversational AI: STT + LLM + TTS (single platform). One platform owns the full seam between conversation turns, which is where end-users actually judge voice quality.

1Latency is a system property, not a feature. Multi-vendor pipelines add round-trip overhead at every hop; we avoid every one of those hops.
2One platform handles speech-to-text, reasoning, and text-to-speech in a single conversation loop, so turn-taking stays predictable.
3Fewer moving parts means fewer SLAs to reconcile, fewer keys to rotate, and one provider roadmap to track instead of three.

Provider: ElevenLabs Conversational AI · STT + LLM + TTS (single platform)

Capabilities

Honest gaps. No marketing fog.

Knowledge retrieval via rag-training crosses three surfaces; public-embed via iframe-embed lives in two; conversation-log includes the REST API. The matrix below is generated from the same facts module the audit gate enforces.

Capability	Voice Agent	Knowledge Training	Embed Widget	REST API
Streaming voice	✓	—	✓	—
Barge-in / interrupt	✓	—	✓	—
Knowledge retrieval	✓	✓	✓	—
Multilingual	✓	—	✓	—
Conversation log	✓	—	✓	✓
Public embed	—	—	✓	✓

Agent demo

Try a persona

Each persona is a curated system-prompt preset, not a separate model. Tap any chip to swap the transcript below. Every opener uses the same “this is your AresGen voice agent” frame so the persona is the visible variable, not the brand.

Persona: Support Tier-1· professional

Hi, this is your AresGen voice agent. How can I help today?

I see. Let me check that order for you.

I've queued a refund. You'll see it in 3–5 business days.

Use cases

Five workflows teams ship on Realtime.

Tier-1 deflection with sub-second turn-taking. The agent handles common questions, escalates with full transcript when a human is needed.

Pricing

One subscription. Every voice surface.

Lite

Try a voice agent

Solo

Solo creators + indie use

Pro

Voice agents

Knowledge training + iframe embed

Business

Voice agents

REST API + conversation logging

Voice agents unlock on Pro and Business. Pro covers knowledge training and the embeddable iframe widget; Business adds the REST API and conversation logging. See /pricing for the full breakdown.

Pair this with

Compose your voice + chat stack

Voiceover

Need TTS without a live conversation loop? Generate one-shot voice clips across 32+ voices and 29 languages.

Explore Voiceover

AI Chat

Need text-only chat across multiple models? Route between 12 catalog-validated models in one subscription.

Explore AI Chat

FAQ

Eight answers before you ask.

ElevenLabs Conversational AI: a single platform that handles speech-to-text, reasoning, and speech back. No second vendor sits between the user and the agent.

Spin up your voice agent.

One provider, 8 personas, knowledge-base RAG, iframe + REST embed. Start free.

Spin up a voice agent See pricing