AI Agents

Give your AI agent a phone: turning chat agents into voice agents

Slack, Gmail, Notion — your agent can already do everything except pick up the phone. Here's how to add a real phone-call tool in 8 lines, and what changes when it can.

CTCall2Me Team

May 6, 20265 min read

Most "AI agents" today live entirely inside text. They read your inbox, write your docs, file your tickets, post in your Slack. The list of tools they can reach grows every week — Composio alone ships hundreds of toolkits, Pipedream chains thousands of apps, every model lab is racing to publish its own.

Look closely at that list and you'll notice something missing: the phone.

Your agent can email a customer. It can DM a customer. It can open a ticket on behalf of a customer. The one thing it cannot do is call a customer — the single action a human assistant does dozens of times a day.

This is the post about closing that gap.

The 8-line version

If you just want the code, skip to the bottom — there's a copy-pasteable place_call tool you can add to any agent today. Your first $10 of voice usage is free; no credit card required.

Get an API key →

Why "voice" stayed missing

There's a reason Twilio gets to charge what it charges: voice is annoying. To let an agent place a call, something has to:

Acquire a phone number that works in the destination country
Actually dial out and ring the recipient
Hold the audio session open with low-enough latency to feel human
Run real-time speech-to-text on what the recipient says
Run an LLM on the running transcript to produce the next response
Synthesize that response in a believable voice in the right language
Stream that audio back over the phone, sub-500ms round trip
Hand back a transcript, recording, and structured outcome at the end

That's not one tool. That's a vendor. Which is why nobody had wired this into the agent ecosystems — until everybody decided to.

What changes when an agent can call

A surprisingly large amount.

Three patterns we keep seeing

Lead follow-up. Your CRM agent finishes scoring an inbound lead, decides it's hot, and places the call right then — instead of writing a follow-up task that nobody clicks.
Active escalation. Your support agent is mid-chat with a frustrated user. Instead of "I'll have someone call you," it just calls them — same bot, different channel.
Bulk reach-out. Your operations agent has 500 appointments tomorrow. It runs a campaign overnight: dials each number, confirms attendance, summarizes the outcomes for the dashboard you'll read with coffee.

These aren't speculative. The third one is in production right now for a chain of dental clinics. The first one is what every "AI BDR" startup is selling. The second one is the most underrated of the three — a chat-only agent that escalates to a phone call when the user is upset is unreasonably more reassuring than a chat-only agent that doesn't.

The mental shift: phone is a tool, not a stack

The reason "voice AI" feels like a separate world is that, historically, building voice required wiring up STT/LLM/TTS yourself, plus telephony, plus a real-time audio pipeline. So most teams treated voice as a project, not a feature.

Once voice has a stable REST endpoint, the entire frame inverts. Your agent already has 50 tools. Adding place_call is the 51st. The agent decides when to use it the same way it decides when to send a Slack message — by reading the user's intent, picking the right action, filling in the arguments.

The interesting design question stops being "how do I build a voice pipeline" and becomes "when should my agent reach for the phone vs. the chat vs. the email?" — which is a much more useful question, because it's product-shaped, not plumbing-shaped.

What an `agent + phone` setup actually looks like

Three actors:

The orchestrator — your top-level agent (Claude, GPT-4o, your Composio session, your Pipedream workflow). Reads the situation, decides to call.
The voice agent — a Call2Me agent you've configured once, with a prompt, a voice, a language, optionally a knowledge base. This is the personality on the phone.
The phone-call tool — the REST endpoint that ties them together. The orchestrator hands off (agent_id, phone_number, optional context); the voice agent runs the call; the platform hands back the transcript and outcome when the call ends.

The voice agent is configured ahead of time, exactly like an email template or a support macro — except it talks. The orchestrator never has to think about voice settings, language detection, or TTS — it just picks a voice_agent_id and a number, and gets back a clean JSON when the call is over.

The eight-line version

Here's the smallest possible "give your agent a phone" tool, written for an LLM with native tool use:

def place_call(phone_number: str, voice_agent_id: str, context: str = "") -> dict:
    """Place a real outbound phone call. Returns transcript + outcome on completion."""
    return requests.post(
        "https://api.call2me.app/v1/calls",
        headers={"Authorization": f"Bearer {CALL2ME_API_KEY}"},
        json={"to_number": phone_number, "agent_id": voice_agent_id, "context": context},
    ).json()

That's it. Wire that into Claude/GPT/Gemini tool use, or wrap it as a Pipedream action, or register it as a Composio custom tool — the contract is identical because the contract is just an HTTP POST.

The agent now does what every human assistant has always done: when the situation calls for a phone call, it picks up the phone.

Want the official toolkit?

We submitted Call2Me to Composio's tool request board on May 4 — if you'd like a first-class composio.toolkit('call2me') integration instead of the custom tool above, upvote the request. The faster the votes pile up, the faster the official integration ships.

In the meantime, the REST endpoint is live and stable, and Pipedream and any custom-tool framework can hit it today.

What to do with this

If you're building agents and you've never seriously considered adding voice as a tool, this is a good week to try it. Two suggestions:

Start with one use case. "If the user explicitly asks to be called, call them." That's it. Don't try to design the perfect calling agent before measuring whether your users actually want to be called. (They will. They always do.)
Use a hosted voice agent for the call itself. Don't try to build the voice pipeline. Use Call2Me, use Vapi, use Retell — pick one that gives you a clean HTTP API and a sub-500ms latency budget. The interesting work is in when to call, not in shaving 50ms off the TTS.

Most of the things "voice AI agents" will become famous for in the next year are not technical breakthroughs. They are the same agents you already have, given a phone. Give yours one this week.

Frequently asked

Q.Do I have to choose between Composio and Pipedream?

No. The phone-call action is just a REST endpoint, so any tool platform that can hit a Bearer-auth API can call it. Composio and Pipedream solve different problems — Composio for hosted toolkits inside your agent runtime, Pipedream for chained workflows around it. Both can call Call2Me.

Q.Will my agent randomly call people?

Only when you ask it to. Phone calls are an explicit tool — the agent has to decide to invoke it, with arguments you describe. In practice the harder problem is making sure your agent does call when it should (e.g., when the user explicitly asks).

Q.What does it cost per call?

Voice agent time is billed per-minute on Call2Me — the platform fee starts at $0.10/min and bundles the LLM, STT and TTS in that price. There's no extra fee for being driven by an external agent vs. the dashboard.

Q.Can the calling agent be a different model than the answering agent?

Yes — and usually should be. The orchestrator (the one that decides to call) can be GPT-4o or Claude on the agent platform. The voice agent that actually runs the call is a separate Call2Me agent with its own prompt, voice, and language. They communicate through the API.

ShareX / Twitter LinkedIn

All posts

Try Call2Me free

Spin up a voice agent in 5 minutes. No credit card required.

Start free trial

Give your AI agent a phone: turning chat agents into voice agents

Why "voice" stayed missing

What changes when an agent can call

The mental shift: phone is a tool, not a stack

What an `agent + phone` setup actually looks like

The eight-line version

What to do with this

Frequently asked

How to choose a voice AI provider: the 8 questions that actually matter

Voice AI for dental practices: pain triage, recall calls, and the no-show problem

Voice AI for e-commerce: order status, returns, and abandoned-cart recovery on the phone

Try Call2Me free

Give your AI agent a phone: turning chat agents into voice agents

Why "voice" stayed missing

What changes when an agent can call

The mental shift: phone is a tool, not a stack

What an agent + phone setup actually looks like

The eight-line version

What to do with this

Frequently asked

Keep reading

How to choose a voice AI provider: the 8 questions that actually matter

Voice AI for dental practices: pain triage, recall calls, and the no-show problem

Voice AI for e-commerce: order status, returns, and abandoned-cart recovery on the phone

Try Call2Me free

What an `agent + phone` setup actually looks like