Build Your Own AI Assistant In Java, Explained In 10 Minutes |

Welcome to this practical, hands-on guide to building your own AI assistant in Java. We'll start with a little theory to ensure we're all on the same page, so no prior experience with AI-powered applications is necessary. If you're already familiar with the concepts, feel free to skip to the coding sections. However, reviewing the initial theory is recommended to ensure a shared understanding of the terminology.

The Power and Problem of Modern AI

A personal experience highlighted the tremendous potential for AI to do good. After receiving a pet's CT scan report filled with dense medical jargon in a hard-to-read format (a photo of a printed document), turning to a tool like ChatGPT provided a breakthrough. By instructing the AI to act as a knowledgeable radiologist capable of clear communication, the report was quickly summarized into an easy-to-digest format with headings and bullet points, alleviating hours of worry.

However, anyone who has used tools like ChatGPT has likely encountered situations where it struggles with seemingly simple tasks. This reveals a core challenge when building AI-powered applications for specific business or project needs: Large Language Models (LLMs) are generic, but real-world problems are not. To make an AI truly useful, we must provide it with our specific context and the ability to interact with our systems.

In many business applications, there are numerous inefficient workflows. AI can streamline these processes, significantly improving the daily work lives of users. This raises the question: what tools are available for Java developers to build these applications and leverage existing models without delving deep into the models themselves?

Our Project: An AI Airline Assistant

The application we will build is a simulated customer support agent for an airline. The interface will feature a chat window for interacting with the AI and a live view of a database containing flight reservations.

The goal is to enable the AI assistant to: - Answer questions about company policies, such as cancellation rules. - Fetch booking details for customers. - Modify and cancel bookings when permitted by the terms of service.

This project distills real-world system interaction into a simple, understandable application perfect for learning.

Understanding AI Application Architecture

A helpful way to visualize an AI-powered application is through a computer architecture diagram.

The LLM (The CPU): The LLM, or the AI, is like the Central Processing Unit. It's a powerful but generic tool that is limited in a specific context unless we provide assistance.
The Context Window (The RAM): At a minimum, we need a context window, which acts as the working memory (like RAM). This is where we hold the ongoing conversation and any information the AI needs to process.
Vector Store (The Hard Drive): Soon, we'll need to store information more permanently. A vector store allows us to save and retrieve the most relevant information needed to answer a specific question.
Tools (The Programs): We also want the AI to run tools or programs, giving it the ability to perform actions on our behalf.
Model Chaining (The Peripherals): Sometimes, we might want to connect other models to our primary LLM. For instance, when ChatGPT generates an image, one LLM is creating a prompt for a separate diffusion model that generates the image. Different models are designed for different tasks.

This article is a hands-on, engineering-focused guide. We won't delve into the complex math behind AI models. Instead, we will use the APIs available to Java developers to build on the work of others. You'll see that thanks to the available tools, creating powerful AI applications is surprisingly straightforward without needing to understand all the internal workings of an AI.

The Tools for the Job

We will work with two main tools in addition to Spring Boot, which forms the base of our application:

LangChain4j: Initially a Java port of the popular Python LangChain project, LangChain4j has evolved into a popular and rapidly developing library in the Java ecosystem for building AI applications.
Hilla: An open-source, full-stack framework from Vaadin that allows us to build a front-end UI for our application, enabling live interaction as we build.

The Spectrum of Agent Autonomy

We can structure our development along an axis of agent autonomy:

Chatbot: The lowest level of autonomy. This is like the original ChatGPT, answering questions based only on its training data. It has no access to external or personal information.
Retrieval Augmented Generation (RAG): A significant step up. Here, we provide the AI with relevant context to answer a question, making it far more effective for specific domains.
Co-pilot: This is the stage we will reach. We give the AI access to contextual information and the ability to use tools—in our case, fetching, changing, and canceling airline bookings.
Fully Autonomous Agent: The highest level, where the AI handles an entire task from start to finish without human intervention. While such systems will emerge, for business applications, a co-pilot model where the human remains in control is often more comfortable and secure.

Stage 1: Building a Basic Chatbot

In our architecture analogy, a chatbot consists of the LLM (CPU) and a context window (working memory). The context window must hold everything relevant to the current interaction:

The System Prompt: Instructions on how the AI should behave (e.g., "Act as a friendly airline customer support agent").
Chat History: The entire back-and-forth conversation, as the LLM is stateless and needs this history to understand the context.
The Current Prompt: The user's latest message.
Relevant Documents (for RAG): Any external information needed to answer the question.
Room for the Answer: The context window must also have space for the AI's response.

Context windows have grown from a few thousand tokens to over a million, but since we are often charged by token usage, it's more efficient to provide only the most relevant information rather than flooding the prompt.

Stage 2: Retrieval Augmented Generation (RAG)

This stage adds a persistent storage component to our architecture, allowing us to pull in relevant information. An LLM knows two things: its training data and the information we provide in the context window.

To teach an AI new things, we have several options: - Train a new model: Prohibitively expensive and time-consuming. - Fine-tune an existing model: A good option for teaching an AI about static business context. - Retrieval Augmented Generation (RAG): The cheapest and easiest method. We add relevant information directly into the context window when asking a question. This is like giving the LLM an open-book exam.

How RAG Works:

Document Segmentation: We take our source documents (e.g., a Terms of Service document) and split them into smaller, meaningful segments.
Embedding: We run each text segment through an embedding model. This model converts the semantic meaning of the text into a numerical vector. Think of it like a color picker, where similar colors have similar RGB vector values. Similarly, texts with similar meanings will have similar vector representations, though these vectors have thousands of dimensions instead of just three.
Vector Store: We store these vectors in a vector store, a database optimized for vector math.
Retrieval: When a user asks a question (e.g., "What's the cancellation policy?"), we convert the question into a vector using the same embedding model.
Search: We then search the vector store for the document vectors that are most similar to the question's vector.
Augmentation: We take the most relevant document segments found and add them to the prompt, instructing the LLM to use this information to answer the user's question.

Stage 3: The Co-pilot with Tools

The final step is to give our AI agent the ability to run tools. When we make a request to the LLM, we can include metadata describing functions it can call.

For example, if a user says, "My name is John Doe, booking number 123, can you pull up my booking?", we also send the LLM a description of a getBookingDetails(bookingNumber, name) function. The LLM, recognizing it doesn't have the booking information, will extract the parameters from the user's message and ask our application to execute the function. Our application runs the function, returns the result to the LLM, which then has everything it needs to answer the user's request.

Building the UI with Hilla

Hilla is a full-stack React framework for Spring Boot. It provides a large library of customizable UI components, from buttons to data grids.

File-System Routing: Hilla uses file-system-based routing similar to Next.js. An index.tsx file maps to the root URL, about.tsx maps to /about, and so on.
Backend Integration: What makes Hilla special is its seamless backend integration. We create Spring beans and annotate them with @BrowserCallable. This makes them directly callable from the React front-end code with full type safety.

Example of a Hilla backend service: java @BrowserCallable @AnonymousAllowed public class ContactService { public List<Contact> findAll() { // ... implementation } }

Example of calling it from React: typescript const contacts = await ContactService.findAll(); // 'contacts' is a fully typed array of Contact objects If the Java backend changes, we get a compile-time error in our React code, preventing runtime failures.

Let's Start Coding

Now, let's move to the implementation. We'll start with a basic Spring Boot project with LangChain4j and Hilla dependencies.

Initial Setup: Our application.properties file contains the OpenAI API key, the desired model (gpt-4.0), and the embedding model. We also set the temperature to 0 for less random responses.

The project includes: - A FlightService for CRUD operations on bookings. - An AssistantService which is @BrowserCallable and will contain our chat logic. - A React view (index.tsx) with a chat interface and a grid to display booking data.

Step 1: Creating the Basic Assistant

First, we define an interface for our AI assistant. LangChain4j will provide the implementation, similar to how Spring Data creates JPA repositories.

public interface MyAssistant {
    @SystemMessage("""
        You are a customer chat support agent for 'FunAir'.
        You are friendly, helpful and joyous.
        You are interacting with customers through an online chat system.
        Before providing information about a booking, you must have the booking number, customer's first name, and last name.
        Before changing a booking, you must ensure it is permitted by the terms. If there is a charge, you must get consent before proceeding.
        Today's date is .
    """)
    TokenStream chat(@MemoryId String chatId, @UserMessage String userMessage);
}

We inject this MyAssistant into our AssistantService. Initially, the assistant is generic. By adding the @SystemMessage annotation, we give it a personality and context-specific instructions.

Step 2: Adding Memory

Without memory, the assistant forgets the conversation history with each new message. To fix this, we configure a ChatMemoryProvider bean in a LangChain4jConfig class.

@Configuration
class LangChain4jConfig {
    @Bean
    ChatMemoryProvider chatMemoryProvider(Tokenizer tokenizer) {
        return chatId -> TokenWindowChatMemory.withMaxTokens(1000, tokenizer);
    }
}

This bean tells LangChain4j to maintain a history of the last 1000 tokens for each chat session, allowing for continuous conversations.

Step 3: Implementing RAG

To make the assistant aware of our business rules, we implement RAG.

Define an Embedding Store: We create a bean for an EmbeddingStore. For this demo, an InMemoryEmbeddingStore is sufficient. java @Bean EmbeddingStore<TextSegment> embeddingStore() { return new InMemoryEmbeddingStore<>(); }
Ingest Documents: We create an ApplicationRunner bean that runs on startup. This runner loads our terms-of-service.txt document, splits it into segments, creates embeddings for each segment using our configured embedding model, and stores them in the EmbeddingStore.
Create a Content Retriever: We define a ContentRetriever bean. This retriever will take a user's query, create an embedding for it, and find the most relevant text segments from the EmbeddingStore. java @Bean ContentRetriever contentRetriever(EmbeddingStore<TextSegment> embeddingStore, EmbeddingModel embeddingModel) { return EmbeddingStoreContentRetriever.builder() .embeddingStore(embeddingStore) .embeddingModel(embeddingModel) .maxResults(2) .minScore(0.6) .build(); } With this in place, our assistant can now accurately answer questions about the cancellation policy based on the provided document.

Step 4: Adding Tools

Finally, we give the assistant tools to interact with our system. We create a BookingTools Spring component with methods for getting, changing, and canceling bookings.

@Component
class BookingTools {

    private final FlightService flightService;

    // constructor injection

    @Tool("Get booking details by booking number, first name, and last name")
    public BookingDetails getBookingDetails(String bookingNumber, String firstName, String lastName) {
        return flightService.findBooking(bookingNumber, firstName, lastName).orElse(null);
    }

    @Tool("Change the departure or arrival airport for a booking")
    public BookingDetails changeBooking(String bookingNumber, String firstName, String lastName, String departure, String arrival) {
        // ... implementation
    }

    @Tool("Cancel a booking")
    public void cancelBooking(String bookingNumber, String firstName, String lastName) {
        // ... implementation
    }
}

The @Tool annotation makes these methods available to the LLM. LangChain4j automatically handles the process of the LLM deciding to call a tool, extracting parameters, and executing the corresponding Java method.

With these tools, the assistant can now handle complex requests like, "Hi, my name is Robert Taylor, my booking number is 105. Can you please change my flight to Helsinki instead of Frankfurt on the same day?" It can look up the booking, confirm the change, state any applicable fees, and upon confirmation, execute the change in the database.

Conclusion

In a remarkably short amount of time, we have built a meaningful AI application capable of interacting with our system's data, performing operations, and using custom context to ground its responses in our specific reality. This demonstrates the power of modern frameworks like LangChain4j and Hilla in making advanced AI development accessible to all Java developers.