Give your AI agent a phone: turning chat agents into voice agents
Slack, Gmail, Notion — your agent can already do everything except pick up the phone. Here's how to add a real phone-call tool in 8 lines, and what changes when it can.
Most "AI agents" today live entirely inside text. They read your inbox, write your docs, file your tickets, post in your Slack. The list of tools they can reach grows every week — Composio alone ships hundreds of toolkits, Pipedream chains thousands of apps, every model lab is racing to publish its own.
Look closely at that list and you'll notice something missing: the phone.
Your agent can email a customer. It can DM a customer. It can open a ticket on behalf of a customer. The one thing it cannot do is call a customer — the single action a human assistant does dozens of times a day.
This is the post about closing that gap.
If you just want the code, skip to the bottom — there's
a copy-pasteable place_call tool you can add to any agent today. Your first
$10 of voice usage is free; no credit card required.
Why "voice" stayed missing
There's a reason Twilio gets to charge what it charges: voice is annoying. To let an agent place a call, something has to:
- Acquire a phone number that works in the destination country
- Actually dial out and ring the recipient
- Hold the audio session open with low-enough latency to feel human
- Run real-time speech-to-text on what the recipient says
- Run an LLM on the running transcript to produce the next response
- Synthesize that response in a believable voice in the right language
- Stream that audio back over the phone, sub-500ms round trip
- Hand back a transcript, recording, and structured outcome at the end
That's not one tool. That's a vendor. Which is why nobody had wired this into the agent ecosystems — until everybody decided to.
What changes when an agent can call
A surprisingly large amount.
- Lead follow-up. Your CRM agent finishes scoring an inbound lead, decides it's hot, and places the call right then — instead of writing a follow-up task that nobody clicks.
- Active escalation. Your support agent is mid-chat with a frustrated user. Instead of "I'll have someone call you," it just calls them — same bot, different channel.
- Bulk reach-out. Your operations agent has 500 appointments tomorrow. It runs a campaign overnight: dials each number, confirms attendance, summarizes the outcomes for the dashboard you'll read with coffee.
These aren't speculative. The third one is in production right now for a chain of dental clinics. The first one is what every "AI BDR" startup is selling. The second one is the most underrated of the three — a chat-only agent that escalates to a phone call when the user is upset is unreasonably more reassuring than a chat-only agent that doesn't.
The mental shift: phone is a tool, not a stack
The reason "voice AI" feels like a separate world is that, historically, building voice required wiring up STT/LLM/TTS yourself, plus telephony, plus a real-time audio pipeline. So most teams treated voice as a project, not a feature.
Once voice has a stable REST endpoint, the entire frame inverts. Your agent
already has 50 tools. Adding place_call is the 51st. The agent decides when to
use it the same way it decides when to send a Slack message — by reading the
user's intent, picking the right action, filling in the arguments.
The interesting design question stops being "how do I build a voice pipeline" and becomes "when should my agent reach for the phone vs. the chat vs. the email?" — which is a much more useful question, because it's product-shaped, not plumbing-shaped.
What an agent + phone setup actually looks like
Three actors:
- The orchestrator — your top-level agent (Claude, GPT-4o, your Composio session, your Pipedream workflow). Reads the situation, decides to call.
- The voice agent — a Call2Me agent you've configured once, with a prompt, a voice, a language, optionally a knowledge base. This is the personality on the phone.
- The phone-call tool — the REST endpoint that ties them together. The
orchestrator hands off
(agent_id, phone_number, optional context); the voice agent runs the call; the platform hands back the transcript and outcome when the call ends.
The voice agent is configured ahead of time, exactly like an
email template or a
support macro — except it talks. The
orchestrator never has to think about voice settings, language detection, or
TTS — it just picks a voice_agent_id and a number, and gets back a clean
JSON when the call is over.
The eight-line version
Here's the smallest possible "give your agent a phone" tool, written for an LLM with native tool use:
def place_call(phone_number: str, voice_agent_id: str, context: str = "") -> dict:
"""Place a real outbound phone call. Returns transcript + outcome on completion."""
return requests.post(
"https://api.call2me.app/v1/calls",
headers={"Authorization": f"Bearer {CALL2ME_API_KEY}"},
json={"to_number": phone_number, "agent_id": voice_agent_id, "context": context},
).json()
That's it. Wire that into Claude/GPT/Gemini tool use, or wrap it as a Pipedream action, or register it as a Composio custom tool — the contract is identical because the contract is just an HTTP POST.
The agent now does what every human assistant has always done: when the situation calls for a phone call, it picks up the phone.
We submitted Call2Me to Composio's tool request board on May 4 — if you'd like a
first-class composio.toolkit('call2me') integration instead of the custom
tool above, upvote the request.
The faster the votes pile up, the faster the official integration ships.
In the meantime, the REST endpoint is live and stable, and Pipedream and any custom-tool framework can hit it today.
What to do with this
If you're building agents and you've never seriously considered adding voice as a tool, this is a good week to try it. Two suggestions:
- Start with one use case. "If the user explicitly asks to be called, call them." That's it. Don't try to design the perfect calling agent before measuring whether your users actually want to be called. (They will. They always do.)
- Use a hosted voice agent for the call itself. Don't try to build the voice pipeline. Use Call2Me, use Vapi, use Retell — pick one that gives you a clean HTTP API and a sub-500ms latency budget. The interesting work is in when to call, not in shaving 50ms off the TTS.
Most of the things "voice AI agents" will become famous for in the next year are not technical breakthroughs. They are the same agents you already have, given a phone. Give yours one this week.
Frequently asked
Q.Do I have to choose between Composio and Pipedream?
No. The phone-call action is just a REST endpoint, so any tool platform that can hit a Bearer-auth API can call it. Composio and Pipedream solve different problems — Composio for hosted toolkits inside your agent runtime, Pipedream for chained workflows around it. Both can call Call2Me.
Q.Will my agent randomly call people?
Only when you ask it to. Phone calls are an explicit tool — the agent has to decide to invoke it, with arguments you describe. In practice the harder problem is making sure your agent does call when it should (e.g., when the user explicitly asks).
Q.What does it cost per call?
Voice agent time is billed per-minute on Call2Me — the platform fee starts at $0.10/min and bundles the LLM, STT and TTS in that price. There's no extra fee for being driven by an external agent vs. the dashboard.
Q.Can the calling agent be a different model than the answering agent?
Yes — and usually should be. The orchestrator (the one that decides to call) can be GPT-4o or Claude on the agent platform. The voice agent that actually runs the call is a separate Call2Me agent with its own prompt, voice, and language. They communicate through the API.
Keep reading
All posts- Buyer's Guide
How to choose a voice AI provider: the 8 questions that actually matter
A practical buyer's guide for picking voice AI infrastructure — eight evaluation questions, what to ask each vendor, and what answers should make you keep looking.
May 13, 20269 min - Healthcare
Voice AI for dental practices: pain triage, recall calls, and the no-show problem
Dental practices have a different phone problem than general clinics — pain triage decisions, slot-type matching for cleanings vs procedures, insurance pre-checks, and a no-show rate that quietly eats 8-15% of revenue. How voice AI handles each one.
May 13, 20267 min - E-commerce
Voice AI for e-commerce: order status, returns, and abandoned-cart recovery on the phone
How online retailers use voice AI to handle the four calls that swamp support — where is my order, can I return this, did my refund go through, and the abandoned-cart save call that pays for the entire system.
May 13, 20267 min