Google's Gemini CLI: The Open-Source AI Agent Explained in 10 Minutes |

In the world of development, a new open-source AI agent has arrived: Gemini CLI. This article will explore what it is, test it on a local system, and see what it’s like to use the power of Gemini APIs directly from the command line.

While there are other solutions like the closed-source Cloud Code from Anthropic and the open-source Codec CLI from OpenAI, Gemini CLI has seen remarkable traction. Released just a few days ago, it has already garnered about 40k stars on GitHub, surpassing the 30k stars of its competitor which has been available for several months. Let's dive into what the buzz is all about, install it locally, and put it to the test.

The Developer's Home: The Command Line

For developers, the command line interface isn't just a tool; it's home. The terminal's efficiency, ubiquity, and portability make it the go-to utility for getting work done. As reliance on the terminal endures, so does the demand for integrated AI assistance.

This is why Google has introduced Gemini CLI, an open-source AI agent that brings the power of Gemini directly into your terminal. It provides lightweight, direct access from your prompts to the models. While it excels at coding, Gemini CLI was built to do much more. It's a versatile local utility for a wide range of tasks, from content generation and problem-solving to deep research and task management.

Google has also integrated Gemini CLI with its AI coding assistant, Gemini Code Assist. This means developers on standard, free, or enterprise Code Assist plans get a prompt-driven approach to AI coding in both VS Code and the Gemini CLI.

Usage Limits and Key Features

The usage limits for individuals are quite generous: - 1,000 model requests per day - 60 model requests per minute

This free license gives you access to Gemini 1.5 Pro and its massive 1 million token context window, all at no charge.

Key Capabilities: - Ground prompts with Google Search: Fetch and process information from web pages. - Extend capabilities with Model Context Protocols (MCPs): Integrate with external services like GitHub. - Customize prompts and instructions: Tailor Gemini for your specific needs and workflows. - Automate tasks: Integrate with your existing workflows by invoking Gemini CLI non-interactively within scripts. - Open Source: Licensed under Apache 2.0, allowing you to contribute and clone the code.

Installation and Setup in 5 Steps

Let's get started with the installation.

Step 1: Install Node.js

First, you need Node.js installed on your system. It is recommended to use an LTS version. You can download the appropriate installer for your operating system and follow the installation steps. To verify the installation, open a command prompt and type:

node --version

You should see a version number, for example, 22.16.0.

Step 2: Install Gemini CLI

Next, execute the following command in your terminal to install the CLI globally:

npm install -g @google/gemini-cli

Step 3: Start Gemini CLI

Once the installation is complete, you can start the agent by simply typing:

gemini

Step 4: Choose a Theme

The first time you run it, you'll be prompted to choose a color scheme. You can also invoke this manually later with the /themes command. A popular choice is the GitHub Dark mode.

Step 5: Authenticate Your Account

You will also need to authenticate. You can trigger this with the /auth command, which presents three options: 1. Login with Google (Recommended) 2. Using a Gemini API key from AI Studio 3. Using Vertex AI

If you exhaust the 1,000 daily requests with the default Google login, you can provide your own Gemini API key to continue. For now, select "Login with Google" and follow the sign-in process in your browser. Once authorized, you can return to your terminal.

Exploring the Interface and Commands

Gemini CLI offers a variety of commands to manage your interaction. You can type /help to see a list of available options.

Here are some of the most useful commands: - /about: Displays the CLI version and authentication details. - /docs: Opens the full official documentation in your browser. - /stats: Shows session statistics, including input/output tokens and remaining context window. - /editor: Choose your preferred editor (e.g., VS Code, Cursor, Vim). - /mcp list: Lists all configured Model Context Protocol servers. - /quit: Exits the Gemini CLI, showing a summary of token usage.

You can also interact with your local file system by referencing files with @ (e.g., @main.py) or run shell commands directly using ! (e.g., !npm run dev).

Architecture Overview

The Gemini CLI consists of a client-side application (packages/cli) that communicates with a local server (packages/core).

Here’s a typical interaction flow: 1. User Input: The user types a prompt into the terminal (managed by packages/cli). 2. Request to Core: The CLI sends the request to the local server (packages/core). 3. Prompt Construction: The core package constructs a prompt for the Gemini API, including conversation history and available tool definitions. 4. API Response: The Gemini API processes the prompt and returns a response, which might be a direct answer or a request to use a tool. 5. Tool Execution: If a tool is requested (e.g., to modify the file system), the user is prompted for approval. Once confirmed, the core package executes the action and sends the result back to the Gemini API. 6. Final Response: The API processes the tool's result and generates a final response, which is sent back to the CLI and displayed to the user.

Practical Examples

Generating a Code Function

Let's ask Gemini to write a simple function.

Prompt:

write me a complete function to calculate the cube root of a number

The CLI will read any referenced files and then propose the code change. You can allow it once or always.

# main.py
import math

def cube_root(number):
  """Calculates the cube root of a number."""
  return number**(1/3)

Follow-up Prompt:

try to run this for the number 7896

Gemini will outline its strategy, which involves running a shell command. After you approve it, it executes python main.py.

Output:

The cube root of 7896 is approximately 19.91.

Analyzing an Existing Codebase

You can also use Gemini CLI to understand an existing project. After navigating to a project directory and starting gemini, you can ask:

read the entire codebase and give me the summary

The agent will read all relevant files and provide a detailed breakdown of the application's architecture, dependencies, and purpose.

Extending Gemini with MCPs: The GitHub Example

One of the most powerful features is the Model Context Protocol (MCP), which allows Gemini to connect to external services. Let's set up a GitHub MCP server.

Step 1: Create a GitHub Access Token

Go to your GitHub Settings > Developer settings > Personal access tokens > Fine-grained tokens.
Generate a new token. Give it a name and grant it access to public repositories.

Step 2: Install and Run Docker

The MCP server will run in a Docker container, so you need Docker Desktop installed and running on your system.

Step 3: Configure the MCP Server

You need to add the server configuration to the settings.json file. To find this file, you can determine your user home directory (e.g., by running whoami on Linux/macOS or checking C:\Users\<YourUsername> on Windows). The file is located at <UserHome>/.gemini/settings.json.

Open settings.json and add the following configuration, replacing <YOUR_GITHUB_TOKEN> with the token you generated.

"mcpServers": [
  {
    "name": "GitHub",
    "command": "docker",
    "args": [
      "run",
      "--rm",
      "-i",
      "-e",
      "GITHUB_TOKEN",
      "ghcr.io/google/gemini-cli/mcp-github:latest"
    ],
    "environment": {
      "GITHUB_TOKEN": "<YOUR_GITHUB_TOKEN>"
    }
  }
]

Step 4: Restart and Verify

Restart Gemini CLI. Now, you can press Ctrl+T to view the status of your MCP servers. You should see the GitHub MCP server running and ready with over 50+ available tools.

Now you can use this MCP to interact with GitHub.

Prompt:

find the most updated repository on my GitHub account

The agent will ask for your username, construct a query for the GitHub MCP server, and ask for permission to execute it. Once you allow it, it will fetch and display your most recently updated repository.

This new Gemini CLI is an incredibly powerful and versatile tool. With its seamless integration, extensive features, and the ability to extend its functionality, it is set to become an indispensable part of the modern developer's toolkit.