Core API Reference

Contents

Core API Reference#

This section contains the main components of the code-index library.

Main Classes#

CodeIndexer#

class code_index.indexer.CodeIndexer(processor: LanguageProcessor, index: BaseIndex | None = None, store_relative_paths: bool = True)[source]#

Bases: object

A repository-level source code indexer for analyzing and indexing code symbols.

This class provides functionality to parse source code files using tree-sitter and create an index of function and method definitions and references. It supports multiple programming languages through configurable language processors.

The indexer can process individual files or entire project directories, extracting symbol information and storing it in a configurable index backend for later retrieval and analysis.

__init__(processor: LanguageProcessor, index: BaseIndex | None = None, store_relative_paths: bool = True)[source]#

Initializes the CodeIndexer with the specified configuration.

Parameters:
  • processor – The language processor instance used to parse source code files. This processor defines which programming language(s) will be supported and how the parsing will be performed.

  • index – The index backend for storing symbol information. If None, defaults to SimpleIndex. This allows for different storage strategies (in-memory, database, etc.).

  • store_relative_paths – Whether to store file paths relative to the project root directory. If True (default), paths are stored relative to the project root. If False, absolute paths are used.

Note

The processor’s supported file extensions determine which files will be processed during indexing operations.

Example

Here is a basic example of how to use the CodeIndexer to index a project, find definitions and references of a function, and persist the index to a JSON file.

from pathlib import Path
from code_index import CodeIndexer
from code_index.language_processor import PythonProcessor
from code_index.index.persist import SingleJsonFilePersistStrategy

# Initialize the indexer with a Python language processor
indexer = CodeIndexer(PythonProcessor())

# Index a project directory
indexer.index_project(Path("/path/to/project"))

# Find definitions of a specific function
definitions = indexer.find_definitions("my_function")
for defn in definitions:
    print(
        f"Found definition at {defn.location.file_path}:{defn.location.start_lineno}"
    )

# Find references to a specific function
references = indexer.find_references("my_function")
for ref in references:
    print(
        f"Found reference at {ref.location.file_path}:{ref.location.start_lineno}"
    )

# Save the index to a JSON file
indexer.dump_index(Path("index.json"), SingleJsonFilePersistStrategy())
__str__()[source]#

Returns a string representation of the CodeIndexer instance.

Returns:

A formatted string containing the processor, index, and configuration details of this CodeIndexer instance.

property processor: LanguageProcessor#

Gets the language processor used by this indexer.

Returns:

The LanguageProcessor instance configured for this indexer.

property index#

Gets the index backend used by this indexer.

Returns:

The BaseIndex instance used for storing and retrieving symbol information.

_process_definitions(tree: Tree, source_bytes: bytes, file_path: Path, processor: LanguageProcessor | None = None)[source]#

Processes and indexes all function and method definitions in a parsed AST.

This method extracts function and method definitions from the abstract syntax tree and adds them to the index. It handles both standalone functions and class methods.

Parameters:
  • tree – The parsed abstract syntax tree from tree-sitter.

  • source_bytes – The raw source code as bytes, used for extracting symbol text and position information.

  • file_path – The path to the source file being processed.

  • processor – Optional language processor to use. If None, uses the indexer’s default processor.

Note

This is an internal method that processes definition nodes identified by the language processor and adds them to the index storage.

_process_references(tree: Tree, source_bytes: bytes, file_path: Path, processor: LanguageProcessor | None = None)[source]#

Processes and indexes all function and method references in a parsed AST.

This method extracts function and method call sites from the abstract syntax tree and adds them to the index. It identifies where functions and methods are being invoked or referenced in the code.

Parameters:
  • tree – The parsed abstract syntax tree from tree-sitter.

  • source_bytes – The raw source code as bytes, used for extracting reference text and position information.

  • file_path – The path to the source file being processed.

  • processor – Optional language processor to use. If None, uses the indexer’s default processor.

Note

This is an internal method that processes reference nodes identified by the language processor and adds them to the index storage.

index_file(file_path: Path, project_path: Path, processor: LanguageProcessor | None = None)[source]#

Parses and indexes a single source code file.

This method processes a single file, extracting function and method definitions and references. It will attempt to parse files even if their extension is not in the processor’s supported extension list, logging a warning in such cases.

Parameters:
  • file_path – The path to the source file to be indexed. Must be a valid file.

  • project_path – The root path of the project, used for calculating relative paths when store_relative_paths is True.

  • processor – Optional language processor to use for this file. If None, uses the indexer’s default processor.

Note

If the file cannot be read due to I/O errors, the operation will be skipped with an error log. Non-file paths are also skipped with a warning.

Example

indexer.index_file(Path("src/main.py"), Path("src/"))
index_project(project_path: Path)[source]#

Recursively indexes all supported files in a project directory.

This method walks through the entire project directory tree and indexes all files with extensions supported by the configured language processor. Only files matching the processor’s supported extensions are processed.

Parameters:

project_path – The root directory path of the project to be indexed. All subdirectories will be recursively processed.

Note

Files with unsupported extensions are automatically skipped. The indexing progress is logged at info level with start and completion messages.

Example

indexer.index_project(Path("/path/to/project"))
find_definitions(name: str) list[Definition][source]#

Finds all definitions of functions or methods with the given name.

Searches the index for all definition locations of functions or methods that match the specified name. This includes both standalone functions and class methods.

Parameters:

name – The name of the function or method to search for.

Returns:

A list of Definition objects containing location and context information for each found definition. Returns an empty list if no definitions are found.

Example

definitions = indexer.find_definitions("calculate_total")
for defn in definitions:
    print(
        f"Found definition at {defn.location.file_path}:{defn.location.start_lineno}"
    )
# Output: Found definition at src/utils.py:15
find_references(name: str) list[Reference][source]#

Finds all references to functions or methods with the given name.

Searches the index for all locations where functions or methods with the specified name are called or referenced. This includes function calls, method invocations, and other forms of symbol references.

Parameters:

name – The name of the function or method to search for.

Returns:

A list of PureReference objects containing location and context information for each found reference. Returns an empty list if no references are found.

Example

references = indexer.find_references("calculate_total")
for ref in references:
    print(
        f"Found reference at {ref.location.file_path}:{ref.location.start_lineno}"
    )
# Output: Found reference at src/main.py:42
dump_index(output_path: Path, persist_strategy: PersistStrategy)[source]#

Persists the current index data to a file using the specified strategy.

Saves all indexed symbol information to persistent storage. The format and structure of the saved data depends on the persistence strategy used.

Parameters:
  • output_path – The file path where the index data should be saved.

  • persist_strategy – The persistence strategy that defines how the data should be serialized and stored (e.g., JSON, SQLite, etc.).

Raises:

IOError – If the file cannot be written due to permission or disk issues.

Example

from code_index.index.persist import JSONPersistStrategy

indexer.dump_index(Path("index.json"), JSONPersistStrategy())
load_index(input_path: Path, persist_strategy: PersistStrategy)[source]#

Loads index data from a file using the specified strategy.

Replaces the current index with data loaded from persistent storage. The format and structure of the loaded data depends on the persistence strategy used, which should match the strategy used when saving.

Parameters:
  • input_path – The file path from which to load the index data.

  • persist_strategy – The persistence strategy that defines how the data should be deserialized and loaded (e.g., JSON, SQLite, etc.).

Raises:
  • IOError – If the file cannot be read due to permission or existence issues.

  • ValueError – If the file format is invalid or incompatible.

Note

This operation completely replaces the current index. Any unsaved indexing work will be lost.

Example

from code_index.index.persist import JSONPersistStrategy

indexer.load_index(Path("index.json"), JSONPersistStrategy())
get_function_info(func_like: Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')]) FunctionLikeInfo | None[source]#

Retrieves comprehensive information about a specific function or method.

Gets detailed information about a function or method, including its definitions, references, and other metadata stored in the index.

Parameters:

func_like – A Symbol object (Function or Method) representing the symbol to retrieve information for.

Returns:

A FunctionLikeInfo object containing comprehensive information about the symbol, including all its definitions and references. Returns None if the symbol is not found in the index.

Example

func = Function(name="calculate_total")
info = indexer.get_function_info(func)
if info:
    print(f"Function has {len(info.definitions)} definitions")
    print(f"Function has {len(info.references)} references")
get_all_functions() list[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')]][source]#

Retrieves all functions and methods stored in the index.

Returns a list of all Symbol objects (Functions and Methods) that have been indexed. This provides a complete overview of all symbols tracked by the indexer.

Returns:

A list of Symbol objects representing all indexed functions and methods. Returns an empty list if no symbols have been indexed.

Example

all_functions = indexer.get_all_functions()
print(f"Index contains {len(all_functions)} functions/methods")
# Output: Index contains 42 functions/methods
for func in all_functions:
    print(f"- {func.name}")
# Output:
# - calculate_total
# - process_data
clear_index()[source]#

Clears all indexed data and resets the index to an empty state.

Removes all definitions, references, and other symbol information from the index. This operation cannot be undone unless the index data has been previously saved using dump_index().

Note

This creates a new instance of the same index class, ensuring a completely clean state while maintaining the same index configuration.

Example

indexer.clear_index()
print(f"Index now contains {len(indexer.get_all_functions())} functions")
# Output: Index now contains 0 functions

Command Line Interface#

The code-index library provides a command-line interface for indexing source code repositories.

code_index.__main__.main()[source]#

Command-line interface for the code-index tool.

This function provides a command-line interface for indexing source code repositories and exporting the results in various formats. It supports multiple programming languages and persistence strategies.

For detailed usage information, run:

uv run -m code_index –help

Note

This function is designed to be called from the command line via the entry point defined in pyproject.toml.

Command Usage#

Use the command-line tool to index source code repositories:

# View all available options
uv run -m code_index --help

# Basic usage example
code-index /path/to/repository --language python --dump-type json

Data Models#

Pydantic models for representing code elements and their relationships.

This module defines the core data structures used to model functions, methods, references, definitions, and their relationships in a codebase.

class code_index.models.CodeLocation(*, file_path: Path, start_lineno: int, start_col: int, end_lineno: int, end_col: int, start_byte: int, end_byte: int)[source]#

Bases: BaseModel

Represents a specific location in source code.

Contains precise position information including line numbers, column positions, and byte offsets for a code element within a source file.

file_path: Path#

Path to the source file containing this location.

start_lineno: int#

Starting line number (1-based).

start_col: int#

Starting column number (0-based).

end_lineno: int#

Ending line number (1-based).

end_col: int#

Ending column number (0-based).

start_byte: int#

Byte offset in the file where the location starts (including).

end_byte: int#

Byte offset in the file where the location ends (not including).

model_config: ClassVar[ConfigDict] = {'frozen': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

__str__() str[source]#

Return a string representation of the code location.

class code_index.models.Function(*, type: Literal[SymbolType.FUNCTION] = SymbolType.FUNCTION, name: str)[source]#

Bases: BaseSymbol

Represents a standalone function in the codebase.

A function is a callable code block that is not bound to any class. This includes module-level functions, nested functions, and lambda functions.

type: Literal[SymbolType.FUNCTION]#

Type discriminator for function.

name: str#

The name of the function.

model_config: ClassVar[ConfigDict] = {'frozen': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class code_index.models.Method(*, type: Literal[SymbolType.METHOD] = SymbolType.METHOD, name: str, class_name: str | None)[source]#

Bases: BaseSymbol

Represents a method bound to a class in the codebase.

A method is a function that belongs to a class. The class_name may be None for method calls where the class context cannot be determined statically.

type: Literal[SymbolType.METHOD]#

Type discriminator for method.

name: str#

The name of the method.

class_name: str | None#

The name of the class the method belongs to. May be None for calls where the class context is not accessible or determinable.

model_config: ClassVar[ConfigDict] = {'frozen': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

code_index.models.Symbol#

Represents a function or method in the codebase. A discriminated union type that can be either a Function or a Method.

The discriminator field ‘type’ is used to determine which variant to deserialize to. Pydantic automatically handles serialization and deserialization based on this field.

Example usage:

from pydantic import TypeAdapter

# For standalone validation of Symbol objects
funclike_adapter = TypeAdapter(Symbol)

# Validate from dict
func_data = {"type": "function", "name": "my_func"}
function_obj = funclike_adapter.validate_python(func_data)

# Validate from JSON
method_json = '{"type": "method", "name": "my_method", "class_name": "MyClass"}'
method_obj = funclike_adapter.validate_json(method_json)

alias of Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description=’Discriminated union for function-like entities.’, discriminator=’type’)]

class code_index.models.SymbolReference(*, symbol: Function | Method, reference: PureReference)[source]#

Bases: BaseModel

Represents a reference to a function-like entity with context.

This combines a function or method symbol with the specific reference location, providing full context about where and what is being referenced.

symbol: Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')]#

The function or method being referenced.

reference: PureReference#

The reference information including location.

model_config: ClassVar[ConfigDict] = {'frozen': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class code_index.models.PureReference(*, location: CodeLocation)[source]#

Bases: BaseModel

A minimal, hashable fingerprint of a function or method reference.

This class serves as a unique identifier containing only the essential location information needed to distinguish one reference from another. It acts as a fingerprint for the more comprehensive Reference class and is designed to be used as a dictionary key for fast lookups.

The “Pure” designation indicates this contains only the core, immutable identity of a reference without any additional contextual information.

location: CodeLocation#

The code location where the reference occurs.

model_config: ClassVar[ConfigDict] = {'frozen': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class code_index.models.Definition(*, location: ~code_index.models.CodeLocation, doc: str | None = None, calls: list[~code_index.models.SymbolReference] = <factory>, llm_note: ~code_index.models.LLMNote | None = None, source_code: str | None = None)[source]#

Bases: BaseModel

Extended definition information with additional contextual data.

This class inherits from PureDefinition (which serves as its fingerprint) and adds comprehensive information about the definition’s context and behavior. The design is extensible, allowing future additions of more definition-related data without breaking the core identification mechanism provided by PureDefinition.

A definition is where a function or method is declared/implemented, including the location and any function calls made within its body.

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

location: CodeLocation#

The code location where the definition occurs.

doc: str | None#

Optional documentation string for the definition, if available.

calls: list[SymbolReference]#

List of function/method calls made within this definition.

This may include calls inside any wrapped functions, closures, or lambdas inside the definition.

llm_note: LLMNote | None#

Optional note generated by an LLM about this definition.

to_pure() PureDefinition[source]#

Extract the pure fingerprint of this definition for use as a dictionary key.

Returns:

A PureDefinition containing only the location information, suitable for hashing and fast lookups.

classmethod from_pure(pure_def: PureDefinition) Definition[source]#

Create a Definition instance from a PureDefinition.

This method allows creating a Definition instance with an empty context (no calls) from a PureDefinition, which is useful for initializing definitions before additional context is added.

Parameters:

pure_def (PureDefinition) – The pure definition to convert.

Returns:

A new Definition instance with the same location as the PureDefinition.

Return type:

Definition

add_callee(callee: SymbolReference) Definition[source]#

Add a callee reference to this definition.

This method allows adding a reference to a function or method that is called within this definition. It ensures that the callee is added only if it is not already present.

Parameters:

callee (SymbolReference) – The callee reference to add.

Returns:

The updated Definition instance with the new callee(s) added.

Return type:

Definition

set_note(note: LLMNote) Definition[source]#

Add or update the LLM-generated note for this definition.

Parameters:

note (LLMNote) – The LLM-generated note to add or update.

Returns:

The updated Definition instance with the new note.

Return type:

Definition

merge(other: Definition) None[source]#

Merge information about the same PureDefinition.

This method allows merging additional contextual information from another Definition into this one, such as additional calls made within the definition.

Parameters:

other (Definition) – Another Definition instance with additional context to merge.

Raises:

ValueError – If the other definition does not have the same PureDefinition.

class code_index.models.Reference(*, location: ~code_index.models.CodeLocation, called_by: list[~code_index.models.SymbolDefinition] = <factory>)[source]#

Bases: BaseModel

Extended reference information with additional contextual data.

This class inherits from PureReference (which serves as its fingerprint) and adds optional contextual information about the reference. The design is extensible, allowing future additions of more reference-related data without breaking the core identification mechanism provided by PureReference.

A reference occurs when a function or method is called, passed as an argument, or otherwise used in the code (but not where it’s defined).

location: CodeLocation#

The code location where the reference occurs.

called_by: list[SymbolDefinition]#

The definitions that call this reference, if applicable.

Note this means “which definitions call this reference” rather than “which symbols call this reference?”. Though there should be no more than one caller of a given call-site, we hold this as an iterable, because there are lambdas, closures, wrapped functions, and one call-site may be found inside the scope of multiple definitions.

model_config: ClassVar[ConfigDict] = {'frozen': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

to_pure() PureReference[source]#

Extract the pure fingerprint of this reference for use as a dictionary key.

Returns:

A PureReference containing only the location information, suitable for hashing and fast lookups.

classmethod from_pure(pure_ref: PureReference) Reference[source]#

Create a Reference instance from a PureReference.

This method allows creating a Reference instance with an empty context (no call sites) from a PureReference, which is useful for initializing references before additional context is added.

Parameters:

pure_ref (PureReference) – The pure reference to convert.

Returns:

A new Reference instance with the same location as the PureReference.

Return type:

Reference

add_caller(caller: SymbolDefinition) Reference[source]#

Add a caller definition to this reference.

This method allows adding a definition that call this reference. It ensures that the caller is added only if it is not already present.

Parameters:

caller

Returns:

The updated Reference instance with the new caller(s) added.

Return type:

Reference

merge(other: Reference) None[source]#

Merge information about the same PureReference.

This method allows merging additional contextual information from another Reference into this one, such as additional call sites or definitions that reference this location.

Parameters:

other (Reference) – Another Reference instance with additional context to merge.

Raises:

ValueError – If the other reference does not have the same PureReference.

class code_index.models.SymbolDefinition(*, symbol: Function | Method, definition: PureDefinition)[source]#

Bases: BaseModel

Represents a definition of a function-like entity with context.

This combines a function or method symbol with the specific definition information, providing full context about where and what is being defined.

symbol: Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')]#

The function or method being defined.

definition: PureDefinition#

The definition information including location.

model_config: ClassVar[ConfigDict] = {'frozen': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class code_index.models.PureDefinition(*, location: CodeLocation)[source]#

Bases: BaseModel

A minimal, hashable fingerprint of a function or method definition.

This class serves as a unique identifier containing only the essential location information needed to distinguish one definition from another. It acts as a fingerprint for the more comprehensive Definition class and is designed to be used as a dictionary key for fast lookups.

The “Pure” designation indicates this contains only the core, immutable identity of a definition without any additional contextual information.

location: CodeLocation#

The code location where the definition occurs.

model_config: ClassVar[ConfigDict] = {'frozen': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class code_index.models.FunctionLikeInfo(*, definitions: list[~code_index.models.Definition] = <factory>, references: list[~code_index.models.Reference] = <factory>)[source]#

Bases: BaseModel

Contains comprehensive information about a function or method.

Aggregates all known information about a function or method, including all its definitions (in case of overloads or multiple declarations) and all references to it throughout the codebase.

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

definitions: list[Definition]#

List of all definition locations for this symbol.

references: list[Reference]#

List of all reference locations for this symbol.

class code_index.models.IndexDataEntry(*, symbol: Function | Method, info: FunctionLikeInfo)[source]#

Bases: BaseModel

Represents a single entry in the serialized index data.

Each entry associates a function or method symbol with its complete information including definitions and references.

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

symbol: Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')]#

The function or method symbol.

info: FunctionLikeInfo#

Complete information about the symbol.

class code_index.models.IndexData(*, type: str, data: list[~code_index.models.IndexDataEntry] = <factory>, metadata: dict[~typing.Any, ~typing.Any] | None = None)[source]#

Bases: BaseModel

Represents the complete index data in a serializable format.

This is the top-level container for all indexed information about functions and methods in a codebase. Used for persistence and data exchange between different index implementations.

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: str#

String identifier indicating the index type (e.g., “simple_index”).

data: list[IndexDataEntry]#

List of all indexed symbol entries.

metadata: dict[Any, Any] | None#

Optional metadata about the index, such as the indexer version, creation timestamp, etc.

Configuration#