Build Your Own AI Voice Agent: A Guide to Migrating from Retell AI to LiveKit |

In this article, we'll explore how you can replicate the exact same AI agent you've built on drag-and-drop platforms like Retell AI or Synth Flow using the open-source LiveKit framework. This approach grants you full ownership and control over your agent's functionality, allows you to deploy it on your own servers, and frees you from the per-minute costs associated with many commercial platforms.

Why Move Away from Drag-and-Drop Platforms?

Platforms like Retell AI are excellent for building an MVP quickly. They abstract away the complexity and let you get a functional agent up and running in minutes. However, they typically charge on a per-minute basis. For instance, Retell's pricing is around 7 to 12 cents per minute, and this rate can change depending on the LLM you choose. While this is fine for initial testing, the costs can escalate if you plan to handle hundreds or thousands of calls for a serious business application.

This is where an open-source solution like LiveKit becomes a powerful alternative.

Introducing LiveKit: The Open-Source Alternative

LiveKit provides the foundational infrastructure for building real-time voice and video applications. The interesting part is that platforms like Retell AI are actually built on top of LiveKit. They use LiveKit as their backend, build a user-friendly platform around it, and offer it to you as a service.

By using LiveKit directly, you are essentially going to the source. It's like buying one or two items from a retail shop versus placing a bulk order of thousands of pieces directly from the factory outlet—the latter is always more cost-efficient and gives you more control.

A quick search confirms this relationship, with sources stating that companies have built their conversational AI products on LiveKit. It's also worth noting that there are other open-source alternatives. For example, Vapi uses a platform called PipeCat. Both LiveKit and PipeCat are fully open-source, allowing you to host them on your own servers. This is a significant advantage for data privacy and for complying with regulations like those in the EU.

A Practical Demonstration: Dental Clinic Agent

To illustrate the similarity, let's look at a simple receptionist agent for a dental clinic. This agent is designed to book appointments, take messages, and answer general queries.

First, we'll test the agent running on a Retell-like platform.

Interaction 1: The Retell AI Agent

Here is a sample conversation with the agent hosted on the commercial platform.

Agent: Hello, welcome to New York Dental Clinic. How can I help you today?

User: Uh, hey, what is your name?

Agent: My name is Maria and I'm here to assist you with your dental care needs at New York Dental Clinic. How can I help you today? May I have your name, please?

User: Uh, can you tell me what you can do?

Agent: Of course. I can help you with the following. Book, cancel, or reschedule dental appointments. Answer general questions about it.

Replicating the Agent with LiveKit

Now, let's run the exact same agent using a local LiveKit server. To get it running, you would typically use a command in your terminal.

dental_agent.py start

The agent is connected to a Twilio phone number. Here is the interaction with the self-hosted version.

Interaction 2: The LiveKit Agent

Agent: Hello, welcome to New York Dental Clinic. How can I help you today?

User: Uh, can you tell me what is your name?

Agent: My name is Maria. Could you please tell me your name?

User: Uh, yes. So, and also tell me what you can do.

Agent: Sure. I can help you book, cancel or reschedule an appointment.

As you can see, the behavior is nearly identical. The slight variations in response are due to using a different LLM—in this case, an OpenAI model—but the core capability is the same. This demonstrates that you can achieve the same results with an open-source setup.

A common question is whether you can replicate all the features of Retell on LiveKit. The answer is absolutely yes. Since Retell is built on LiveKit, any feature available in Retell is fundamentally achievable in a direct LiveKit implementation.

During the call, LiveKit provides a detailed log of the interaction, including transcriptions and events, which you can send to a webhook for post-call processing, just like you would with Retell.

Example Log: Incoming call from +1-XXX-XXX-XXXX... ... Transcription item from user: "can you tell me what is your name?" ... Transcription item from agent: "My name is Maria."

Anatomy of a LiveKit Agent: Code Breakdown

Let's look at the structure of the Python code for the agent. It's surprisingly straightforward. For deployment, you can use services like AWS or Fly.io, which is a reliable and fast option for hosting LiveKit agents.

# 1. Import necessary libraries
from livekit.agents import JobContext, JobRequest, WorkerOptions, cli
from livekit.agents.llm import LLM
from livekit.agents.tts import TTS
from livekit.plugins import deepgram, openai, elevenlabs
import asyncio

# Define the agent's personality and instructions
PROMPT = """
Your name is Maria. You are a helpful receptionist for the New York Dental Clinic.
You can book, cancel, or reschedule appointments. You can also answer general questions.
Be polite and professional.
"""

# 2. Define functions for tools, like checking appointment availability
async def check_availability(date: str, time: str) -> str:
    # Logic to check the booking system
    print(f"Checking availability for {date} at {time}")
    return "The selected time slot is available."

class DentalAgent:
    def __init__(self):
        # Initialize plugins for Speech-to-Text, LLM, and Text-to-Speech
        self.stt = deepgram.Plugin()
        self.llm = openai.Plugin(prompt=PROMPT, tools=[check_availability])
        self.tts = elevenlabs.Plugin()

    async def process(self, ctx: JobContext):
        # Start listening to the user's audio
        async for audio_frame in ctx.audio_stream:
            # Transcribe audio, get LLM response, and speak back to the user
            text = await self.stt.transcribe(audio_frame)
            if text:
                response_text = await self.llm.chat(text)
                await self.tts.speak(response_text)

        # 3. Handle the end of the session (e.g., create a summary)
        summary = await self.llm.chat("Summarize the conversation.")
        print(f"Call Summary: {summary}")
        print(f"Call duration: {ctx.call_duration} seconds")
        print(f"Caller number: {ctx.caller_number}")

# 4. Run the agent application
if __name__ == "__main__":
    cli.run(DentalAgent, WorkerOptions(port=8080))

This example outlines the core components: importing modules, defining the agent's logic and tools, processing the conversation in real-time, and handling post-call actions. By building directly on LiveKit, you gain full control and can create highly customized, cost-effective voice agents for any application.