Call2Me
All posts
Multilingual

Run one voice agent in 9 languages: a multilingual setup guide

How to deploy a single voice agent that handles Turkish, English, German, French, Spanish, Italian, Portuguese, and Arabic — without spinning up nine separate agents. The architecture, the trade-offs, and the live setup.

Call2Me TeamApril 30, 20266 min read
Globe with speech bubbles in nine languages converging on a single voice agent

If you sell anywhere outside one country, you eventually hit the same wall: your support and sales calls come in nine different languages, but you have one team and one phone tree. The classic answer was to staff multiple call centers or to limit business hours per language.

Voice AI flips this. One agent, one phone number, nine languages. Same brand voice, same knowledge base, no language-specific call routing.

Here's how it actually works on Call2Me, what works well, and where the trade-offs are.

The languages, in case you're skimming

Turkish, American English, British English, German, French, Spanish, Italian, Brazilian Portuguese, and Arabic. All running on the same stack, all production ready. Turkish and English get the most polish from us — the rest are high-quality by default but worth a 10-minute sanity-check pass.

Try it free →

The two ways multilingual usually fails

Failure mode 1: separate agents per language. You build nine agents, give each a different prompt translation, route based on phone number area code or caller language detection. This works but the maintenance cost is brutal — every change to the system prompt has to happen nine times, and the personalities drift.

Failure mode 2: one agent in one language. You pick English, write the prompt in English, and pretend everyone speaks it. Half your callers hang up within 10 seconds. This is what most "global" voice agents on the market actually do.

The way that works is one agent, language-aware at runtime.

How it works under the hood

Three layers, all of them already multilingual:

CALLERSTREAMINGSTT120msDeepgram Nova-3REASONINGLLM240msGPT-4o · streamingSYNTHESISTTS110msElevenLabs FlashREPLYTOTAL END-TO-END~470ms
Voice AI pipeline — every component streams in parallel
  1. Streaming STT — Deepgram Nova-3 (default) does language auto-detection in the first 1-2 seconds and transcribes accordingly. You don't pre-declare the call language. The caller speaks Turkish, the transcript comes out in Turkish.
  2. LLM — modern frontier LLMs (the agent uses openrouter/auto by default, which routes to GPT-4o, Claude, Gemini Pro depending on availability) are genuinely multilingual. They reason and generate in whatever language the transcript came in.
  3. TTS — ElevenLabs and Cartesia both support multilingual voices. You pick one voice; that voice speaks all nine languages with consistent character.

The result: caller hears the same warm voice in their native language, no matter which one they speak.

Setting it up

Step 1 — Create the agent (multilingual mode)

In the wizard, on the language step, you have three options:

  • Single language — fastest, cheapest STT, but rejects other languages.
  • Multilingual (auto-detect) — what we want here. STT auto-detects from the 9 supported languages.
  • Custom set — pick a specific subset (e.g. only Turkish + English) for faster STT model loading.

For most international businesses, multilingual auto-detect is right.

Step 2 — Pick a multilingual voice

In the Voice section, filter by Multilingual. Available options as of writing:

  • ElevenLabs Multilingual v2 — the most natural across all 9 languages. Slightly higher cost but worth it.
  • ElevenLabs Turbo Multilingual — faster, lower latency, slightly less expressive.
  • Cartesia Sonic Multilingual — the cheapest, surprisingly good in Turkish and Arabic.
  • OpenAI TTS Multilingual — solid, included in the base voice price.

Click Preview on a few to hear how the same voice sounds in 3-4 languages. Pick the one that matches your brand.

Step 3 — Write the system prompt (in English)

Counterintuitive but correct: write the system prompt in English, even if most of your callers won't speak English.

Why: the LLM understands English best, follows English instructions most reliably, and translates its output to the caller's language at speech time. Writing the prompt in Turkish doesn't make Turkish output any better — it just makes the prompt harder for the LLM to follow.

The one rule:

You are [Name], the receptionist at [Company].

CRITICAL: respond in the same language the caller is speaking. If they
speak Turkish, respond in Turkish. If they speak German, respond in
German. Never switch languages unless the caller does first. Match
their formality level (formal/informal "you" in languages that have it).

[rest of the prompt — restaurant rules, escalation, etc. — also in English]

That's the entire multilingual ritual. The model handles the rest.

The English-prompt-with-language-mirror-instruction is the most counter-intuitive trick in voice AI. Try writing the prompt in Turkish first if you don't believe it; you'll come back.

Step 4 — Test in three languages

Call the agent. Try:

  1. Open in Turkish: "Merhaba, akşam için 4 kişi rezervasyon yapmak istiyorum." → expect a fluent Turkish reply.
  2. Switch mid-call: "Wait, sorry, can we do this in English?" → the agent should switch immediately and stay in English.
  3. Try a third language: "Tatsächlich, sprechen wir Deutsch." → German reply, same voice.

If any of these fail, the most common fix is the system prompt — the "respond in caller's language" instruction got lost. Make it the first line.

Knowledge base in multiple languages

Your KB documents can be in any language, or mixed. The retriever embeds them language-agnostically and the LLM grounds its answers in whatever chunks come back. So you can:

  • Upload a Turkish menu PDF + an English wine list. A French caller asks about wine and gets the English wine list translated into French on the fly.
  • Upload one mixed-language FAQ doc. The retriever still finds the right chunk.

What doesn't work: uploading the same content in 9 languages. The retriever will give you back 9 near-identical chunks, the LLM will try to merge them, and the answer quality drops. Pick one source-of-truth language per topic and let the LLM translate.

The trade-offs (be honest)

What's great:

  • Zero per-language operational overhead.
  • Voice character consistent across languages.
  • Caller never has to "press 1 for English" — UX is dramatically better.
  • Code-switching mid-call works (callers do this naturally in some markets).

What's still rough:

  • Arabic is the weakest of the 9 in current TTS — the prosody is good but not as natural as the others. Improving fast.
  • Mixed-language input on noisy lines sometimes confuses STT. Our typical observation: clean line, no problem; bad line, may stick with first detected language for an extra utterance.
  • Languages outside the 9 (Polish, Vietnamese, Thai) work in the LLM but not in our supported TTS — the agent will respond in text-quality but the voice will sound off. Don't promise customers languages we haven't validated.

Cost: same as single-language

Multilingual mode doesn't cost extra. The pricing is the same $0.10/min voice base, regardless of how many languages the call switched between. The only thing that changes the cost is the TTS engine you pick (multilingual ElevenLabs costs slightly more than OpenAI TTS, and that's reflected in the per-minute voice base).

The use cases that actually move the needle

Multilingual is overkill for a single-country small business. It's a force multiplier for:

  • Cross-border SaaS with customers in 5+ countries.
  • Tourism + hospitality — Istanbul hotel, Antalya tour operator, Lisbon restaurant. Callers expect their language; you can't staff for it.
  • Logistics + e-commerce with EU + Turkey + GCC customers.
  • Healthcare clinics in major cities with international patient bases.

If you're in any of these, the cost-of-not-doing-it (lost calls because the front desk only speaks Turkish or English) is almost always higher than the cost of doing it.

Common pitfalls

  • Writing the prompt in the customer's language. Don't. English prompt, language-mirror instruction.
  • Forgetting to enable auto-detect STT. Single-language mode is faster but silently rejects other languages — looks like the agent ignored the caller.
  • Picking a non-multilingual voice. OpenAI's nova (English-only) sounds great in English and broken in Turkish. Always filter the voice gallery by multilingual when you're going multi-language.
  • Knowledge base duplication. One source-of-truth document per topic. The retriever + LLM handle the translation.

Ship a multilingual agent today

Sign up, run the wizard with multilingual selected, and have an agent that handles 9 languages live in the next 15 minutes.

Start free →

Try Call2Me free

Spin up a voice agent in 5 minutes. No credit card required.

Start free trial