Glossary: voice AI terms in one place
Plain-language definitions for every term that shows up in voice AI — STT, TTS, VAD, RAG, SIP, BYOC, function calling, and the rest.
Updated May 6, 2026
A glossary of the terms that show up across voice AI documentation, sales calls, and integration discussions. If you've ever nodded along without quite knowing what someone meant, this is for you.
A
Agent — A configured voice or chat AI persona. Includes a prompt, a voice, a language, optional knowledge base, optional functions. The unit you create, edit, and deploy.
API key — A long-lived secret your backend uses to authenticate with Call2Me's REST API. Workspace-scoped; rotatable. See Authentication.
B
Backchannel — Brief acknowledgments ("uh-huh", "right") an agent makes while the caller is speaking. Makes conversations feel more natural. Configurable per agent.
BYOC — Bring Your Own Carrier. Connect your existing telephony provider (NetGSM, Twilio, Telnyx, an enterprise PBX) to Call2Me via SIP. See SIP Trunks.
C
Campaign — A queued set of outbound calls placed by an agent against a contact list, with concurrency, retry, and time-window controls. See Campaigns.
Concurrency — How many calls run simultaneously in a campaign or workspace. Higher concurrency finishes faster but requires more phone-number capacity and respects carrier rate limits.
D
DTLS-SRTP — The encryption WebRTC uses for media streams. Enabled by default on all browser voice sessions.
DTMF — The classic phone touch-tone input. Voice agents can recognize DTMF input (e.g. "press 1 to confirm") in addition to speech.
Dynamic variables — Values you inject into an agent's prompt at
call time ({{customer_name}}, {{appointment_date}}). Lets one agent
serve many personalized conversations.
E
E.164 — The international phone number format with + and country
code: +905551234567. Required for to_number and from_number in
the API.
Extraction — Post-call structured data extraction. Define fields, the platform pulls them out of the transcript. See Post-Call Actions.
F
Function — A tool the LLM can call mid-conversation. You define the endpoint and JSON Schema; the agent decides when to invoke. See Functions.
I
Idempotency key — A UUID you send with POST requests so retries
don't create duplicate resources. Any non-idempotent endpoint accepts
Idempotency-Key.
Interruption — When the caller starts speaking while the agent is
mid-sentence. The agent's TTS stops; the caller is heard. Tunable per
agent (interruption_sensitivity).
IVR — Interactive Voice Response. The old-school touch-tone phone menu. Voice AI is the conversational replacement.
K
Knowledge Base (KB) — A collection of text/URL/file content chunked, embedded, and queryable by an agent. Ground your agent in your real content (menu, FAQ, policies). See Knowledge Base.
L
Latency — End-to-end response time, from caller stops speaking to agent starts speaking. Measured in milliseconds. See Voice for the budget breakdown.
LiveKit — The WebRTC infrastructure for browser-based voice sessions. Hosted by us. See LiveKit & WebRTC.
LLM — Large Language Model. The "brain" that generates the agent's responses. Configurable per agent (GPT-4o, Claude 3.5, Gemini, etc.).
M
Member — A user with access to a workspace. Has a role: owner, admin, member, or viewer. See Members.
Multilingual — A voice or agent that handles multiple languages on the same call. ElevenLabs Multilingual v2 covers 9 languages with one voice character.
P
Post-Call Action — A workflow triggered after extraction lands. Schedule a follow-up call, fire a webhook, transfer mid-call. See Post-Call Actions.
R
RAG — Retrieval-Augmented Generation. The agent retrieves relevant KB chunks before generating a response. Keeps the agent grounded in your content.
Rate limit — Per-API-key request limits. Returns 429 when exceeded. Headers tell you where you stand. See Errors & Rate Limits.
S
Schedule — A future call booked at a specific date and time. Can be one-shot or recurring. See Schedules.
SIP — Session Initiation Protocol. The signaling protocol for phone calls in IP networks. The carrier-side glue.
SIP Trunk — A SIP connection from a carrier to Call2Me, used for BYOC. See SIP Trunks.
SRTP — Secure RTP. Encrypts the audio stream. Negotiated automatically when both sides support it.
STT — Speech-To-Text. The first stage of every voice turn — what the caller said, transcribed. Default provider: Deepgram.
T
Tenant — White-label brand layer above one or more workspaces. Users signing up via your custom domain land in your tenant. See White-Label.
Transfer — Handing the active call off to a human or another agent. Implemented as a built-in function. See Functions.
TTS — Text-To-Speech. The last stage of every voice turn — what
the agent will say, synthesized. Configurable per agent
(voice_id).
Turn — One round-trip in a conversation: caller speaks, agent listens, agent speaks. Latency is measured per turn.
V
VAD — Voice Activity Detection. The component that decides when
the caller has stopped speaking, so the agent can start. Tunable
indirectly via responsiveness and interruption_sensitivity.
Voice — The character your agent speaks with. Picked from a catalog or cloned via ElevenLabs. See Voices.
W
Webhook — An HTTP POST from Call2Me to your endpoint when an event happens. Real-time event stream. See Webhooks.
White-label — Reselling Call2Me under your own brand, custom domain, custom pricing. See White-Label.
Workspace — One customer account. Has its own agents, calls, KB, wallet, and members.
What's next
- Quickstart — turn the terms into a working agent
- API Reference — the endpoints behind every term here
Frequently asked
Q.What does 'voice agent' mean?
An AI system that holds a real-time spoken conversation. It listens (STT), thinks (LLM), and talks back (TTS) on every turn, fast enough that the back-and-forth feels human.
Q.What's the difference between Voice AI and IVR?
IVR uses recorded prompts and DTMF (touch-tone) input — 'press 1 for billing.' Voice AI uses speech recognition and an LLM — the caller just talks naturally. IVR is a menu; voice AI is a conversation.
Q.What is RAG and why does my agent need it?
Retrieval-Augmented Generation — the agent looks up relevant chunks of your documents before responding. Without it, the agent only knows what's in its system prompt; with it, the agent grounds answers in your actual content (menus, FAQs, policies).
Q.What does BYOC stand for?
Bring Your Own Carrier. Instead of buying a phone number through Call2Me, you keep the number with your existing provider (NetGSM, Twilio, etc.) and route it via SIP.