Prompting the Real World: Why AI Phone Calls Are Harder Than They Look
Building an agent that navigates the chaos of human conversation.
Callomat Product Team Dec 20, 2025
When we tell people Callomat can “call a restaurant and book a table,” the immediate assumption is that we’ve just hooked up a Large Language Model (LLM) to a voice synthesizer. Give it a script, give it a number, and hit go.
But anyone who has ever made a phone call knows it’s never that simple.
In the real world, phone calls are chaotic. Connections crackle. Background noise intervenes. People speak over each other. And most importantly, human conversation is full of ambiguity. A successful AI agent isn’t just a chatbot that reads text aloud; it has to be a highly adaptive negotiation engine.
Here is a high-level look at the product challenges we face in engineering Alex Parker to navigate the complexities of the analog world.
1. The Context Challenge: Grounding the AI in Reality
One of the first hurdles in building a voice agent is “grounding.” An LLM running in a data center has no concept of “here” or “now” unless you explicitly give it one.
If a user in New York asks to book a table for “tomorrow” at a restaurant in Tokyo, the AI needs to understand more than just the calendar date. It needs to understand the business’s reality. Is “tomorrow” already today in Tokyo?
We spend a significant amount of engineering effort on what we call the Pre-Processing Layer. Before a call is ever placed, we have to translate the user’s digital intent into natural, human-readable context. The AI shouldn’t speak in ISO dates (”2025-12-20”); it should speak in relative human terms (”this coming Saturday”). This ensures that when the AI speaks, it sounds like a local human, not a database reader.
2. The Negotiation Dance
The biggest misconception about AI calls is that they are transactional. “I want X.” “Here is X.”
In reality, booking a service is a negotiation. Humans rarely have a single, rigid requirement. We have preferences.
“I’d like 7:00 PM, but I can do 7:30.”
“If you don’t have a table inside, the patio is fine.”
A rigid script fails the moment a host says, “We’re full at 7, but how about 5:30?”
We learned early on that our prompts needed to move beyond simple instruction following and into Intent Understanding. The AI needs to know how flexible the user is. Is this a specific anniversary dinner that must be at 7:00 PM? or is this a casual search for any table tonight?
Depending on the user’s intent, the AI changes its posture. It can be firm and specific, politely declining alternatives if the user needs a precise slot. Or it can be open and collaborative, actively hunting for the “first yes” if the user just wants to get in the door.
3. Surviving the “Unhappy Path”
In software, the “happy path” is when everything goes right. In phone calls, the happy path is the exception.
What happens if the restaurant is “Walk-in Only”? What if the person who answers speaks a different language? What if the line is bad?
We treat our prompts like decision trees for chaos. We have to map out dozens of Scenario Maps—specific instructions for common off-ramps.
Language Switching: Our agents are designed to listen for language cues. We seed the context with the country’s likely language, but the AI must be ready to switch instantly from English to German or Spanish if the human on the other end does.
Graceful Exits: Sometimes, the answer is “no.” The restaurant refuses to take a booking, or they are closed. The AI must know how to accept defeat gracefully, end the call professionally, and report the specific reason back to the user, rather than getting stuck in a loop of trying to convince a closed business to open its doors.
Bridging the Gap
Building Callomat isn’t just about better prompts; it’s about building a bridge between digital intent and physical reality. It requires a deep respect for the nuances of human interaction.
We are building a system that doesn’t just “talk,” but understands the goal of the conversation. It respects the constraints of time and space. And most importantly, it acts as a reliable proxy for you, handling the awkwardness and the hold music so you don’t have to.





