Language Processors#
Language processors are responsible for parsing source code files using tree-sitter and extracting function and method definitions and references.
Base Classes#
Base classes and protocols for language-specific code processing.
This module defines the core interfaces and base implementations for processing source code in different programming languages. It provides:
QueryContext: Container for query execution context
LanguageProcessor: Protocol defining the interface for language processors
BaseLanguageProcessor: Base implementation with common functionality
Language processors use tree-sitter for parsing and analyzing source code to extract function/method definitions and references.
- class code_index.language_processor.base.QueryContext(file_path: Path, source_bytes: bytes)[source]#
Bases:
object
Context information needed for executing tree-sitter queries.
Contains the necessary context for processing source code, including file path and raw source bytes for accurate node extraction.
- class code_index.language_processor.base.LanguageProcessor(*args, **kwargs)[source]#
Bases:
Protocol
Protocol defining the interface for language-specific code processors.
This protocol establishes the contract that all language processors must implement to provide consistent functionality for parsing and analyzing source code across different programming languages.
Language processors are responsible for: - Providing language-specific configuration (extensions, queries) - Parsing source code using tree-sitter - Extracting function/method definitions and references - Converting syntax tree nodes to semantic models
- property extensions: list[str]#
List of file extensions supported by this processor (e.g., [‘.py’]).
- get_definition_query() Query [source]#
Get the tree-sitter query for finding function/method definitions.
- get_reference_query() Query [source]#
Get the tree-sitter query for finding function/method references.
- get_definition_nodes(node: Node) Iterable[Node] [source]#
Extract all definition nodes from a syntax tree node.
- Parameters:
node – The root node to search within.
- Returns:
An iterable of nodes representing function/method definitions.
- get_reference_nodes(node: Node) Iterable[Node] [source]#
Extract all reference nodes from a syntax tree node.
- Parameters:
node – The root node to search within.
- Returns:
An iterable of nodes representing function/method calls.
- handle_definition(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Definition] | None [source]#
Process a function/method definition node.
- Parameters:
node – The syntax tree node representing a definition.
ctx – Context information for the query.
- Returns:
A tuple of (symbol, definition) if successful, None if the node cannot be processed or doesn’t match expected format.
- handle_reference(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Reference] | None [source]#
Process a function/method reference node.
- Parameters:
node – The syntax tree node representing a reference/call.
ctx – Context information for the query.
- Returns:
A tuple of (symbol, reference) if successful, None if the node cannot be processed or doesn’t match expected format.
- class code_index.language_processor.base.BaseLanguageProcessor(name: str, language: Language, extensions: list[str], def_query_str: str, ref_query_str: str)[source]#
Bases:
LanguageProcessor
Base implementation of LanguageProcessor with common functionality.
This class provides a concrete implementation that encapsulates shared logic across all language processors. It handles: - Tree-sitter setup (parser, queries) - Common query execution patterns - Property management
Subclasses need only implement the language-specific logic for handling individual definition and reference nodes.
- __init__(name: str, language: Language, extensions: list[str], def_query_str: str, ref_query_str: str)[source]#
Initialize the base language processor.
- Parameters:
name – The name of the programming language.
language – The tree-sitter Language object.
extensions – List of supported file extensions.
def_query_str – Tree-sitter query string for finding definitions.
ref_query_str – Tree-sitter query string for finding references.
- property extensions: list[str]#
List of file extensions supported by this processor (e.g., [‘.py’]).
- get_definition_query() Query [source]#
Get the tree-sitter query for finding function/method definitions.
- get_reference_query() Query [source]#
Get the tree-sitter query for finding function/method references.
- get_definition_nodes(node: Node) Iterable[Node] [source]#
Extract definition nodes using the configured definition query.
- Parameters:
node – The root node to search within.
- Returns:
An iterable of nodes representing function and method definitions.
- get_reference_nodes(node: Node) Iterable[Node] [source]#
Extract reference nodes using the configured reference query.
- Parameters:
node – The root node to search within.
- Returns:
An iterable of nodes representing function and method calls.
- handle_definition(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Definition] | None [source]#
Handle a definition node - must be implemented by subclasses.
- Parameters:
node – The syntax tree node representing a definition.
ctx – Context information for the query.
- Returns:
A tuple of (symbol, definition) if successful.
- Raises:
NotImplementedError – If not implemented by subclass.
- handle_reference(node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Reference] | None [source]#
Handle a reference node - must be implemented by subclasses.
- Parameters:
node – The syntax tree node representing a reference/call.
ctx – Context information for the query.
- Returns:
A tuple of (symbol, reference) if successful.
- Raises:
NotImplementedError – If not implemented by subclass.
Implementations#
Python Processor#
Python language processor implementation.
This module provides a concrete implementation of the LanguageProcessor protocol for Python source code. It handles Python-specific syntax for function and method definitions, as well as function and method calls using tree-sitter.
The processor supports: - Function definitions (standalone functions) - Method definitions (class-bound functions) - Function calls with identifier names - Method calls with attribute access (obj.method()) - Decorated functions and methods
- class code_index.language_processor.impl_python.PythonProcessor[source]#
Bases:
BaseLanguageProcessor
Language processor for Python source code.
Handles parsing and analysis of Python function and method definitions, as well as function and method calls. Supports Python-specific features like decorators, class methods, and attribute-based method calls.
The processor distinguishes between: - Functions: Standalone callable definitions - Methods: Functions defined within a class - Function calls: Direct function invocation by name - Method calls: Attribute-based method invocation (obj.method())
- handle_definition(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Definition] | None [source]#
Process a Python function or method definition.
Handles function_definition nodes and determines whether they represent standalone functions or class methods based on their context. Also analyzes function calls within the definition body and extracts docstrings.
- Parameters:
node – A function_definition syntax tree node.
ctx – Query context containing file information.
- Returns:
A tuple of (symbol, definition) where symbol is either a Function or Method depending on the definition context, None if the function name cannot be extracted.
- _get_definition_range_node(function_node: Node) Node [source]#
Get the node to use for definition range, including decorators if present.
- Parameters:
function_node – The function_definition node.
- Returns:
The decorated_definition node if the function has decorators, otherwise the function_definition node itself.
- _is_method_definition(node: Node) bool [source]#
Check if a function definition is inside a class (i.e., is a method).
- Parameters:
node – The function_definition node to check.
- Returns:
True if the function is defined within a class, False otherwise.
- _get_class_name_for_method(node: Node, ctx: QueryContext) str | None [source]#
Get the name of the class that contains this method.
- Parameters:
node – The function_definition node representing a method.
ctx – Query context for accessing source bytes.
- Returns:
The class name as a string, or None if not found.
- handle_reference(node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Reference] | None [source]#
Process a Python function or method call.
Handles call nodes and determines whether they represent function calls (direct identifier calls) or method calls (attribute access calls). Uses the entire call node range including arguments for location tracking.
- Parameters:
node – A call syntax tree node.
ctx – Query context containing file information.
- Returns:
A tuple of (symbol, reference) where symbol is either a Function or Method depending on the call type, None if the call cannot be processed.
C Processor#
C language processor implementation.
This module provides a concrete implementation of the LanguageProcessor protocol for C source code. It handles C-specific syntax for function definitions and function calls using tree-sitter.
The processor supports: - Function definitions with various declaration patterns - Function calls and references - Handling of function pointers and complex declarators
- class code_index.language_processor.impl_c.CProcessor[source]#
Bases:
BaseLanguageProcessor
Language processor for C source code.
Handles parsing and analysis of C function definitions and calls. Supports various C function declaration patterns including: - Simple function definitions - Functions with storage class specifiers - Functions returning pointers - Function pointer declarations
- handle_definition(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Definition] | None [source]#
Process a C function definition node.
Handles function_definition nodes with various declaration patterns: 1. Simple: primitive_type -> function_declarator -> compound_statement 2. With modifiers: storage_class_specifier -> primitive_type -> function_declarator -> compound_statement 3. Pointer return: storage_class_specifier -> primitive_type -> pointer_declarator -> compound_statement
- Parameters:
node – A function_definition syntax tree node.
ctx – Query context containing file information.
- Returns:
A tuple of (Function, Definition) if successful, None if the function name cannot be extracted or the definition format is not recognized.
- _extract_function_name(function_def_node: Node, ctx: QueryContext) str | None [source]#
Extract function name from a function_definition node.
Handles various C function declaration patterns by traversing the declarator field which may be either a function_declarator or pointer_declarator containing a function_declarator.
- Parameters:
function_def_node – The function_definition node to process.
ctx – Query context for accessing source bytes.
- Returns:
The function name as a string, or None if extraction fails.
- handle_reference(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Reference] | None [source]#
Process a C function call expression.
Handles call_expression nodes to extract the called function name. Uses the entire call_expression range including function name, parentheses, and arguments for accurate location tracking.
- Parameters:
node – A call_expression syntax tree node.
ctx – Query context containing file information.
- Returns:
A tuple of (Function, PureReference) if successful, None if the call expression doesn’t have a recognizable function identifier.
- _extract_preceding_comment(node: Node, ctx: QueryContext) str | None [source]#
Extract the preceding comment/documentation for a C function definition.
Looks for comment nodes that appear immediately before the function definition. Handles both single-line (
//
) and multi-line (/* */
) comment styles.- Parameters:
node – A function_definition syntax tree node.
ctx – Query context containing file information.
- Returns:
The comment text as a string, or None if not present.
C++ Processor#
C++ language processor implementation.
This module provides a concrete implementation of the LanguageProcessor protocol for C++ source code. It handles C++ specific syntax for function definitions and function calls using tree-sitter.
TODO: Method calls (member function calls with object.method() or ptr->method() syntax) are not yet implemented. Currently only handles standalone function calls.
- class code_index.language_processor.impl_cpp.CppProcessor[source]#
Bases:
BaseLanguageProcessor
Language processor for C++ source code.
Handles parsing and analysis of C++ function definitions and calls. Supports standard C++ function syntax including function declarations, definitions, and basic function calls.
TODO: Method calls and member function analysis not yet implemented. Currently focuses on standalone functions only.
- _handle_function_definition(node: Node, ctx: QueryContext) tuple[Function, Definition] | None [source]#
Handle a C++ function definition node.
Processes function_definition nodes to extract the function name and analyze function calls within the definition body. Also extracts preceding documentation comments.
- Parameters:
node – A function_definition syntax tree node.
ctx – Query context containing file information.
- Returns:
A tuple of (Function, Definition) if successful, None if the node cannot be processed (e.g., malformed function definition).
- handle_definition(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Definition] | None [source]#
Process a definition node based on its type.
- Parameters:
node – The syntax tree node representing a definition.
ctx – Query context containing file information.
- Returns:
A tuple of (symbol, definition) if the node represents a supported definition type, None otherwise.
- _handle_function_call(node: Node, ctx: QueryContext) tuple[Function, Reference] | None [source]#
Handle a C++ function call expression.
Processes call_expression nodes to extract the function name being called. Currently handles simple function calls with identifier names.
TODO: Handle method calls (obj.method() or ptr->method() syntax).
- Parameters:
node – A call_expression syntax tree node.
ctx – Query context containing file information.
- Returns:
A tuple of (Function, PureReference) if successful, None if the call expression cannot be processed.
- handle_reference(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Reference] | None [source]#
Process a function or method call reference.
Analyzes call_expression nodes to identify the called function or method. Currently supports simple function calls with identifier names.
TODO: Add support for method calls including: - Member function calls (obj.method()) - Pointer-to-member calls (ptr->method()) - Static member function calls (Class::method())
- Parameters:
node – A call_expression syntax tree node.
ctx – Query context containing file information.
- Returns:
A tuple of (symbol, reference) if the call can be processed, None if the call expression format is not supported.
- _extract_preceding_comment(node: Node, ctx: QueryContext) str | None [source]#
Extract the preceding comment/documentation for a C++ function definition.
Looks for comment nodes that appear immediately before the function definition. Handles both single-line (
//
) and multi-line (/* */
) comment styles.- Parameters:
node – A function_definition syntax tree node.
ctx – Query context containing file information.
- Returns:
The comment text as a string, or None if not present.