MCP integration#

Integrating the functionality of code-index with the MCP (ModelContextProtocol) enables LLMs/agents to efficiently query and retrieve code definitions, references, and other relevant information from indexed projects.

MCP Protocol#

The Model Context Protocol (MCP) is an open standard that enables AI applications to securely connect to external data sources, tools, and services. It follows a client-server architecture where an AI application (the MCP host) establishes connections to one or more MCP servers through dedicated MCP clients.

Key concepts of MCP include:

  • MCP Host: The AI application that coordinates multiple MCP clients (e.g., Claude Desktop, Visual Studio Code)

  • MCP Client: A component that maintains a connection to an MCP server and retrieves context for the host

  • MCP Server: A program that provides context, tools, and resources to MCP clients

MCP operates on two layers:

  • Data Layer: Implements JSON-RPC 2.0 protocol for message exchange, lifecycle management, and core primitives like tools, resources, and prompts

  • Transport Layer: Manages communication channels and authentication, supporting both stdio (local processes) and HTTP (remote servers) transports

This architecture allows AI applications to extend their capabilities beyond static training data by accessing real-time information, executing functions, and interacting with various systems in a standardized way.

For detailed information about the MCP architecture, see the official documentation at https://modelcontextprotocol.io/docs/learn/architecture

MCP Server Process Model#

MCP servers operate as independent processes, separate from the AI applications that use them. This separation provides several benefits including process isolation, language independence, and deployment flexibility.

Process Architecture#

  • Separate Process Execution: MCP servers run as standalone processes, either locally on the same machine as the AI application or remotely on different servers

  • No Direct Integration: There is no direct code-level integration between MCP servers and AI applications/agents

  • Process Isolation: Each MCP server runs in its own memory space, providing fault tolerance and security boundaries

Communication Protocols#

MCP servers communicate with AI applications through text-based protocols over standard communication channels:

  • STDIO Transport: For local servers, communication occurs through standard input/output streams (stdin/stdout)

  • HTTP Transport: For remote servers, communication uses HTTP requests and responses with JSON payloads

  • JSON-RPC 2.0: All communication follows the JSON-RPC 2.0 specification for message formatting and protocol semantics

Text-Based Interface#

  • Input: MCP servers receive JSON-RPC requests as text through their configured transport layer

  • Processing: The server processes the request using its internal logic and data sources

  • Output: Results are returned as JSON-RPC responses in text format through the same transport channel

  • Stateless Operation: Each request/response cycle is independent, allowing for scalable and reliable operation

This design allows MCP servers to be implemented in any programming language, deployed anywhere, and integrated with any MCP-compatible AI application without requiring specific runtime dependencies or direct code coupling.

Code Repository Analysis with MCP#

The code-index MCP server provides LLMs and AI agents with capabilities for analyzing and exploring codebases. By encapsulating code indexing and querying functionality within the MCP protocol, these tools become portable across different AI platforms and can be integrated into various workflows.

Security Analysis Capabilities#

  • Vulnerability Detection: Help LLMs inspect each function and find potential security vulnerabilities

  • Data Flow Analysis: Trace how data flows through functions and modules to identify potential exploit vectors or information leakage paths

  • Attack Surface Mapping: Identify entry points, external interfaces, and user input handling functions that could be targeted

  • Dependency Analysis: Examine how external libraries and dependencies are used throughout the codebase

Code Exploration and Understanding#

  • Definition Discovery: Locate function, class, and variable definitions across multiple files and projects

  • Reference Tracking: Follow how code elements are used and referenced throughout the codebase

  • Call Graph Analysis: Understand the relationships and dependencies between different parts of the code

  • Pattern Recognition: Search for specific coding patterns, architectural decisions, or implementation approaches

MCP Portability Advantage#

The key benefit of wrapping these capabilities in MCP is portability and reusability. Once implemented as an MCP server, these code analysis tools can be:

  • Injected into any MCP-compatible AI agent (Claude Desktop, custom agents, etc.)

  • Used across different development environments without reimplementation

  • Shared between teams and projects with consistent interfaces

  • Combined with other MCP servers to create comprehensive analysis pipelines

  • Integrated into existing workflows without platform-specific adaptations

This standardized approach means that a security analyst using Claude Desktop, a developer using a custom AI agent, or an automated CI/CD pipeline can all leverage the same code analysis capabilities through their respective MCP-enabled environments.

MCP Server Implementation#

The CodeIndex MCP Server helps LLMs run code-indexing on projects, and provides interface for querying indexed symbols, source code fetching, and other code-related operations. These functionalities can be used by any MCP-compatible AI application or agent.

There is a usage example in the examples/code_index_agent.py file, which uses LangChain framework to build an agent with mcp tools.

FastMCP server implementation for CodeIndex.

This module implements the main MCP server using the FastMCP framework. It provides tools and resources for code repository analysis, including:

  • Code indexing and symbol querying through CodeIndexService

  • Source code fetching with multiple access patterns (full file, line ranges, byte ranges)

  • File path resolution utilities for repository navigation

The server exposes MCP resources for:
  • Full source code retrieval: sourcecode://{file_path}

The server exposes MCP tools for:
  • Repository setup and indexing

  • Symbol querying with flexible search criteria

  • File path resolution within repositories

  • Fetching source code snippets by line or byte ranges

Example

Run the server directly:

python -m code_index.mcp_server.server

Or run via fastmcp CLI:

uv run fastmcp run code_index/mcp_server/server.py:mcp --project .

Note

The server uses stdio transport and is designed to be used with MCP-compatible AI models and tools.

code_index.mcp_server.server.mcp = FastMCP('CodeIndexService')#

FastMCP server instance for CodeIndexService.

This instance can be exported to and run by the FastMCP cli.

code_index.mcp_server.server.resolve_file_path(repo_path: Path, file_path: Path) Path[source]#

Resolve the full path of a file within a repository.

Parameters:
  • repo_path – The path to the repository.

  • file_path – Either an absolute path, or a path relative to the repository.

Returns:

The full path to the file.

async code_index.mcp_server.server.fetch_source_code(file_path: str) str[source]#

Fetch the full source code of a file.

Parameters:

file_path – The path to the file to fetch, in the format ‘sourcecode://{file_path}’.

Note

In case the relative path may not be addressed correctly, it is recommended to resolve the absolute path using resolve_file_path before calling this function.

Returns:

The content of the file as a string.

async code_index.mcp_server.server.fetch_source_code_by_lineno_range(file_path: Path, start_line: int, end_line: int, ctx: Context) str[source]#

Fetch a snippet of source code from a file by line range.

Parameters:
  • file_path – The path to the file to fetch.

  • start_line – The starting line number (1-based, inclusive).

  • end_line – The ending line number (1-based, inclusive).

  • ctx – FastMCP context

Note

In case the relative path may not be addressed correctly, it is recommended to resolve the absolute path using resolve_file_path before calling this function.

Returns:

The content of the specified lines as a string.

async code_index.mcp_server.server.fetch_source_code_by_byte_range(file_path: Path, start_byte: int, end_byte: int, ctx: Context) str[source]#

Fetch a snippet of source code from a file by byte range.

Parameters:
  • file_path – The path to the file to fetch.

  • start_byte – The starting byte offset (0-based, inclusive).

  • end_byte – The ending byte offset (0-based, exclusive).

  • ctx – FastMCP context

Note

In case the relative path may not be addressed correctly, it is recommended to resolve the absolute path using resolve_file_path before calling this function.

Returns:

The content of the specified byte range as a string.

code_index.mcp_server.server.setup_repo_index(repo_path: Path, language: Literal['python', 'c', 'cpp'], strategy: Literal['json', 'sqlite', 'auto'] = 'auto') str[source]#

Set up the indexer for a repository.

This initializes the indexer with the specified language processor. Then it indexes the repository using the indexer. If any cached index data exists, it will be loaded into the indexer.

Parameters:
  • repo_path – The path to the repository to index.

  • language – The programming language of the repository (e.g., ‘python’, ‘c’, ‘cpp’).

  • strategy

    The persistence strategy for the index data (‘json’, ‘sqlite’, or ‘auto’). This will determine in which format the index data is stored to or loaded from cache.

    • ’auto’: Try to select the corresponding strategy according to the format of the cached index data. If

      no cached data exists, it will default to ‘sqlite’.

    • ’json’: Use JSON format for the index data.

    • ’sqlite’: Use SQLite format for the index data.

async code_index.mcp_server.server.query_symbol(query: QueryByKey | QueryByName | QueryByNameRegex | QueryFullDefinition, ctx: Context) CodeQueryResponse[source]#

Query the index for symbols matching the given query.

symbol here refers to a Function-like entity, which can be anything with its definition or call site like a function, class constructor, method. There are multiple ways to query symbols, such as by name, by name regex, etc.

Parameters:
  • query – The query object containing search parameters.

  • ctx – FastMCP context

Returns:

A response object containing the results of the query. There can be multiple results, each containing the location of the symbol, its name, and other relevant information.

code_index.mcp_server.server.get_all_symbols() AllSymbolsResponse[source]#

Get a sorted list of all unique symbols in the index.

Returns:

A response object containing a sorted list of all symbol names.

code_index.mcp_server.server.setup_describe_definitions_todolist() str[source]#

Setup the todolist of the definitions to examine.

To do this, make sure the repo index has already be set up.

Returns:

the success message.

Return type:

str

code_index.mcp_server.server.get_one_describe_definition_task() SymbolDefinition | None[source]#

Get an arbitrary definition task from the todo list.

Returns:

If there is any available/not done definition task, return it. It contains the location of the definition and the corresponding symbol. If all tasks are done, return nothing.

code_index.mcp_server.server.get_full_definition(symbol_definition: SymbolDefinition) Definition | None[source]#

Get the full definition info for a specific symbol definition.

Parameters:

symbol_definition – The symbol definition to retrieve.

Returns:

The full Definition if it exists, otherwise None.

code_index.mcp_server.server.submit_definition_task(symbol_definition: SymbolDefinition, note: LLMNote) str[source]#

Submit a definition task for review.

Parameters:
  • symbol_definition – The symbol definition to submit.

  • note – The LLM note containing the description and potential vulnerabilities.

Returns:

A success message indicating the task has been submitted.

code_index.mcp_server.server.describe_tasks_stats() str[source]#

Get statistics about the description tasks.

Returns:

A string summarizing the current state of the description tasks.

code_index.mcp_server.server.get_pending_describe_tasks(n: int) list[SymbolDefinition][source]#

Get a list of pending description tasks from the todolist.

Parameters:

n – Maximum number of pending tasks to return.

Returns:

List of SymbolDefinition objects that are pending description, limited to n items.

code_index.mcp_server.server.main()[source]#

Main entry point for the MCP server.

Starts the FastMCP server using stdio transport for communication with MCP clients.

Services#

Services that act as backend for MCP requests, providing access to indexed code data and source code fetching.

class code_index.mcp_server.services.CodeIndexService[source]#

Bases: object

MCP service backend for code-index.

static get_instance() CodeIndexService[source]#

Get the singleton instance of CodeIndexService.

assert_initialized(msg: str | None = None) None[source]#

Assert that the service is initialized with a valid indexer.

Parameters:

msg – Optional message to include in the assertion error.

Raises:

RuntimeError – If the indexer service is not initialized.

property indexer: CodeIndexer#

Get the current indexer instance.

property index: CrossRefIndex#

Get the current index instance.

_clear_indexer() None[source]#

Clear the current indexer instance.

static _get_cache_config(repo_path: Path, strategy: Literal['json', 'sqlite', 'auto']) tuple[Path, PersistStrategy][source]#

Get the cache file path for a repository.

log_calling(func_name: str, *args, **kwargs) None[source]#

Capture the calling of a function for logging purposes.

The log will be saved in a file under the .code_index.cache directory

Parameters:

func_name – The name of the function being called.

Returns:

setup_repo_index(repo_path: Path, language: Literal['python', 'c', 'cpp'], strategy: Literal['json', 'sqlite', 'auto'] = 'auto') str[source]#

Set up the indexer for a repository.

This initializes the indexer with the specified language processor. Then it indexes the repository using the indexer. If any cached index data exists, it will be loaded into the indexer.

Parameters:
  • repo_path – The path to the repository to index.

  • language – The programming language of the repository (e.g., ‘python’, ‘c’, ‘cpp’).

  • strategy

    The persistence strategy for the index data (‘json’, ‘sqlite’, or ‘auto’). This will determine in which format the index data is stored to or loaded from cache.

    • ’auto’: Try to select the corresponding strategy according to the format of the cached index data. If

      no cached data exists, it will default to ‘json’.

    • ’json’: Use JSON format for the index data.

    • ’sqlite’: Use SQLite format for the index data.

query_symbol(query: QueryByKey | QueryByName | QueryByNameRegex | QueryFullDefinition) CodeQueryResponse[source]#

Query the index for symbols matching the given query.

symbol here refers to a Function-like entity, which can be anything with its definition or call site like a function, class constructor, method. There are multiple ways to query symbols, such as by name, by name regex, etc.

Parameters:

query – The query object containing search parameters.

Returns:

A response object containing the results of the query. There can be multiple results, each containing the location of the symbol, its name, and other relevant information.

get_all_symbols() AllSymbolsResponse[source]#

Get a sorted list of all unique symbols in the index.

Returns:

A response object containing a sorted list of all symbol names.

persist() str[source]#

Persist the current index data to the configured cache file.

This method saves the current state of the indexer to the cache file specified in the setup. It uses the persistence strategy defined during the setup.

Returns:

A success message indicating that the index data has been persisted.

class code_index.mcp_server.services.SourceCodeFetchService[source]#

Bases: object

MCP service backend for source code fetch.

Use async methods to fetch source code from repositories.

static get_instance() SourceCodeFetchService[source]#

Get the singleton instance of SourceCodeFetchService.

async _fetch_bytes(file_path: Path) bytes[source]#

Fetch the content of a file as bytes.

This method is cached to improve performance for repeated requests for the same file.

Parameters:

file_path – The path to the file to fetch.

Returns:

The content of the file as bytes.

Raises:
async fetch_bytes(file_path: Path) bytes[source]#

Fetch the full source code of a file.

Parameters:

file_path – The path to the file to fetch.

Returns:

The content of the file as a string.

Raises:
async fetch_full_source_code(file_path: Path) str[source]#

Fetch the full source code of a file.

Parameters:

file_path – The path to the file to fetch.

Returns:

The content of the file as a string.

Raises:
async fetch_by_lineno_range(file_path: Path, start_line: int, end_line: int, ctx: Context) str[source]#

Fetch a snippet of source code from a file by line range.

Parameters:
  • file_path – The path to the file to fetch.

  • start_line – The starting line number (1-based, inclusive).

  • end_line – The ending line number (1-based, inclusive).

  • ctx – FastMCP context

Returns:

The content of the specified lines as a string.

Raises:
async fetch_by_byte_range(file_path: Path, start_byte: int, end_byte: int, ctx: Context)[source]#

Fetch a snippet of source code from a file by byte range.

Parameters:
  • file_path – The path to the file to fetch.

  • start_byte – The starting byte offset (0-based, inclusive).

  • end_byte – The ending byte offset (0-based, exclusive).

  • ctx – FastMCP context

Returns:

The content of the specified byte range as a string.

Raises: