Language Processors

Contents

Language Processors#

Language processors are responsible for parsing source code files using tree-sitter and extracting function and method definitions and references.

Base Classes#

Base classes and protocols for language-specific code processing.

This module defines the core interfaces and base implementations for processing source code in different programming languages. It provides:

  • QueryContext: Container for query execution context

  • LanguageProcessor: Protocol defining the interface for language processors

  • BaseLanguageProcessor: Base implementation with common functionality

Language processors use tree-sitter for parsing and analyzing source code to extract function/method definitions and references.

class code_index.language_processor.base.QueryContext(file_path: Path, source_bytes: bytes)[source]#

Bases: object

Context information needed for executing tree-sitter queries.

Contains the necessary context for processing source code, including file path and raw source bytes for accurate node extraction.

file_path: Path#

Path to the source file being processed.

source_bytes: bytes#

Raw bytes of the source file content.

class code_index.language_processor.base.LanguageProcessor(*args, **kwargs)[source]#

Bases: Protocol

Protocol defining the interface for language-specific code processors.

This protocol establishes the contract that all language processors must implement to provide consistent functionality for parsing and analyzing source code across different programming languages.

Language processors are responsible for: - Providing language-specific configuration (extensions, queries) - Parsing source code using tree-sitter - Extracting function/method definitions and references - Converting syntax tree nodes to semantic models

property name: str#

The name of the programming language (e.g., ‘python’, ‘cpp’).

property extensions: list[str]#

List of file extensions supported by this processor (e.g., [‘.py’]).

property language: Language#

The tree-sitter Language object for parsing.

property parser: Parser#

The tree-sitter Parser object configured for this language.

get_definition_query() Query[source]#

Get the tree-sitter query for finding function/method definitions.

get_reference_query() Query[source]#

Get the tree-sitter query for finding function/method references.

get_definition_nodes(node: Node) Iterable[Node][source]#

Extract all definition nodes from a syntax tree node.

Parameters:

node – The root node to search within.

Returns:

An iterable of nodes representing function/method definitions.

get_reference_nodes(node: Node) Iterable[Node][source]#

Extract all reference nodes from a syntax tree node.

Parameters:

node – The root node to search within.

Returns:

An iterable of nodes representing function/method calls.

handle_definition(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Definition] | None[source]#

Process a function/method definition node.

Parameters:
  • node – The syntax tree node representing a definition.

  • ctx – Context information for the query.

Returns:

A tuple of (symbol, definition) if successful, None if the node cannot be processed or doesn’t match expected format.

handle_reference(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Reference] | None[source]#

Process a function/method reference node.

Parameters:
  • node – The syntax tree node representing a reference/call.

  • ctx – Context information for the query.

Returns:

A tuple of (symbol, reference) if successful, None if the node cannot be processed or doesn’t match expected format.

class code_index.language_processor.base.BaseLanguageProcessor(name: str, language: Language, extensions: list[str], def_query_str: str, ref_query_str: str)[source]#

Bases: LanguageProcessor

Base implementation of LanguageProcessor with common functionality.

This class provides a concrete implementation that encapsulates shared logic across all language processors. It handles: - Tree-sitter setup (parser, queries) - Common query execution patterns - Property management

Subclasses need only implement the language-specific logic for handling individual definition and reference nodes.

__init__(name: str, language: Language, extensions: list[str], def_query_str: str, ref_query_str: str)[source]#

Initialize the base language processor.

Parameters:
  • name – The name of the programming language.

  • language – The tree-sitter Language object.

  • extensions – List of supported file extensions.

  • def_query_str – Tree-sitter query string for finding definitions.

  • ref_query_str – Tree-sitter query string for finding references.

property name: str#

The name of the programming language (e.g., ‘python’, ‘cpp’).

property extensions: list[str]#

List of file extensions supported by this processor (e.g., [‘.py’]).

property language: Language#

The tree-sitter Language object for parsing.

property parser: Parser#

The tree-sitter Parser object configured for this language.

get_definition_query() Query[source]#

Get the tree-sitter query for finding function/method definitions.

get_reference_query() Query[source]#

Get the tree-sitter query for finding function/method references.

get_definition_nodes(node: Node) Iterable[Node][source]#

Extract definition nodes using the configured definition query.

Parameters:

node – The root node to search within.

Returns:

An iterable of nodes representing function and method definitions.

get_reference_nodes(node: Node) Iterable[Node][source]#

Extract reference nodes using the configured reference query.

Parameters:

node – The root node to search within.

Returns:

An iterable of nodes representing function and method calls.

handle_definition(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Definition] | None[source]#

Handle a definition node - must be implemented by subclasses.

Parameters:
  • node – The syntax tree node representing a definition.

  • ctx – Context information for the query.

Returns:

A tuple of (symbol, definition) if successful.

Raises:

NotImplementedError – If not implemented by subclass.

handle_reference(node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Reference] | None[source]#

Handle a reference node - must be implemented by subclasses.

Parameters:
  • node – The syntax tree node representing a reference/call.

  • ctx – Context information for the query.

Returns:

A tuple of (symbol, reference) if successful.

Raises:

NotImplementedError – If not implemented by subclass.

Implementations#

Python Processor#

Python language processor implementation.

This module provides a concrete implementation of the LanguageProcessor protocol for Python source code. It handles Python-specific syntax for function and method definitions, as well as function and method calls using tree-sitter.

The processor supports: - Function definitions (standalone functions) - Method definitions (class-bound functions) - Function calls with identifier names - Method calls with attribute access (obj.method()) - Decorated functions and methods

class code_index.language_processor.impl_python.PythonProcessor[source]#

Bases: BaseLanguageProcessor

Language processor for Python source code.

Handles parsing and analysis of Python function and method definitions, as well as function and method calls. Supports Python-specific features like decorators, class methods, and attribute-based method calls.

The processor distinguishes between: - Functions: Standalone callable definitions - Methods: Functions defined within a class - Function calls: Direct function invocation by name - Method calls: Attribute-based method invocation (obj.method())

__init__()[source]#

Initialize the Python processor with language-specific configuration.

handle_definition(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Definition] | None[source]#

Process a Python function or method definition.

Handles function_definition nodes and determines whether they represent standalone functions or class methods based on their context. Also analyzes function calls within the definition body and extracts docstrings.

Parameters:
  • node – A function_definition syntax tree node.

  • ctx – Query context containing file information.

Returns:

A tuple of (symbol, definition) where symbol is either a Function or Method depending on the definition context, None if the function name cannot be extracted.

_get_definition_range_node(function_node: Node) Node[source]#

Get the node to use for definition range, including decorators if present.

Parameters:

function_node – The function_definition node.

Returns:

The decorated_definition node if the function has decorators, otherwise the function_definition node itself.

_is_method_definition(node: Node) bool[source]#

Check if a function definition is inside a class (i.e., is a method).

Parameters:

node – The function_definition node to check.

Returns:

True if the function is defined within a class, False otherwise.

_get_class_name_for_method(node: Node, ctx: QueryContext) str | None[source]#

Get the name of the class that contains this method.

Parameters:
  • node – The function_definition node representing a method.

  • ctx – Query context for accessing source bytes.

Returns:

The class name as a string, or None if not found.

handle_reference(node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Reference] | None[source]#

Process a Python function or method call.

Handles call nodes and determines whether they represent function calls (direct identifier calls) or method calls (attribute access calls). Uses the entire call node range including arguments for location tracking.

Parameters:
  • node – A call syntax tree node.

  • ctx – Query context containing file information.

Returns:

A tuple of (symbol, reference) where symbol is either a Function or Method depending on the call type, None if the call cannot be processed.

_extract_docstring(node: Node, ctx: QueryContext) str | None[source]#

Extract the docstring from a function or method definition.

Parameters:
  • node – A function_definition syntax tree node.

  • ctx – Query context containing file information.

Returns:

The docstring as a string, or None if not present.

_clean_python_docstring(raw_docstring: str) str[source]#

Clean up a Python docstring by removing quotes and normalizing whitespace.

Parameters:

raw_docstring – The raw docstring text including quotes.

Returns:

The cleaned docstring text.

C Processor#

C language processor implementation.

This module provides a concrete implementation of the LanguageProcessor protocol for C source code. It handles C-specific syntax for function definitions and function calls using tree-sitter.

The processor supports: - Function definitions with various declaration patterns - Function calls and references - Handling of function pointers and complex declarators

class code_index.language_processor.impl_c.CProcessor[source]#

Bases: BaseLanguageProcessor

Language processor for C source code.

Handles parsing and analysis of C function definitions and calls. Supports various C function declaration patterns including: - Simple function definitions - Functions with storage class specifiers - Functions returning pointers - Function pointer declarations

__init__()[source]#

Initialize the C processor with language-specific configuration.

handle_definition(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Definition] | None[source]#

Process a C function definition node.

Handles function_definition nodes with various declaration patterns: 1. Simple: primitive_type -> function_declarator -> compound_statement 2. With modifiers: storage_class_specifier -> primitive_type -> function_declarator -> compound_statement 3. Pointer return: storage_class_specifier -> primitive_type -> pointer_declarator -> compound_statement

Parameters:
  • node – A function_definition syntax tree node.

  • ctx – Query context containing file information.

Returns:

A tuple of (Function, Definition) if successful, None if the function name cannot be extracted or the definition format is not recognized.

_extract_function_name(function_def_node: Node, ctx: QueryContext) str | None[source]#

Extract function name from a function_definition node.

Handles various C function declaration patterns by traversing the declarator field which may be either a function_declarator or pointer_declarator containing a function_declarator.

Parameters:
  • function_def_node – The function_definition node to process.

  • ctx – Query context for accessing source bytes.

Returns:

The function name as a string, or None if extraction fails.

handle_reference(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Reference] | None[source]#

Process a C function call expression.

Handles call_expression nodes to extract the called function name. Uses the entire call_expression range including function name, parentheses, and arguments for accurate location tracking.

Parameters:
  • node – A call_expression syntax tree node.

  • ctx – Query context containing file information.

Returns:

A tuple of (Function, PureReference) if successful, None if the call expression doesn’t have a recognizable function identifier.

_extract_preceding_comment(node: Node, ctx: QueryContext) str | None[source]#

Extract the preceding comment/documentation for a C function definition.

Looks for comment nodes that appear immediately before the function definition. Handles both single-line (//) and multi-line (/* */) comment styles.

Parameters:
  • node – A function_definition syntax tree node.

  • ctx – Query context containing file information.

Returns:

The comment text as a string, or None if not present.

_clean_c_comment(raw_comment: str) str[source]#

Clean up a C comment by removing comment delimiters and normalizing whitespace.

Parameters:

raw_comment – The raw comment text including delimiters.

Returns:

The cleaned comment text.

C++ Processor#

C++ language processor implementation.

This module provides a concrete implementation of the LanguageProcessor protocol for C++ source code. It handles C++ specific syntax for function definitions and function calls using tree-sitter.

TODO: Method calls (member function calls with object.method() or ptr->method() syntax) are not yet implemented. Currently only handles standalone function calls.

class code_index.language_processor.impl_cpp.CppProcessor[source]#

Bases: BaseLanguageProcessor

Language processor for C++ source code.

Handles parsing and analysis of C++ function definitions and calls. Supports standard C++ function syntax including function declarations, definitions, and basic function calls.

TODO: Method calls and member function analysis not yet implemented. Currently focuses on standalone functions only.

__init__()[source]#

Initialize the C++ processor with language-specific configuration.

_handle_function_definition(node: Node, ctx: QueryContext) tuple[Function, Definition] | None[source]#

Handle a C++ function definition node.

Processes function_definition nodes to extract the function name and analyze function calls within the definition body. Also extracts preceding documentation comments.

Parameters:
  • node – A function_definition syntax tree node.

  • ctx – Query context containing file information.

Returns:

A tuple of (Function, Definition) if successful, None if the node cannot be processed (e.g., malformed function definition).

handle_definition(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Definition] | None[source]#

Process a definition node based on its type.

Parameters:
  • node – The syntax tree node representing a definition.

  • ctx – Query context containing file information.

Returns:

A tuple of (symbol, definition) if the node represents a supported definition type, None otherwise.

_handle_function_call(node: Node, ctx: QueryContext) tuple[Function, Reference] | None[source]#

Handle a C++ function call expression.

Processes call_expression nodes to extract the function name being called. Currently handles simple function calls with identifier names.

TODO: Handle method calls (obj.method() or ptr->method() syntax).

Parameters:
  • node – A call_expression syntax tree node.

  • ctx – Query context containing file information.

Returns:

A tuple of (Function, PureReference) if successful, None if the call expression cannot be processed.

handle_reference(node: Node, ctx: QueryContext) tuple[Annotated[Function | Method, FieldInfo(annotation=NoneType, required=True, description='Discriminated union for function-like entities.', discriminator='type')], Reference] | None[source]#

Process a function or method call reference.

Analyzes call_expression nodes to identify the called function or method. Currently supports simple function calls with identifier names.

TODO: Add support for method calls including: - Member function calls (obj.method()) - Pointer-to-member calls (ptr->method()) - Static member function calls (Class::method())

Parameters:
  • node – A call_expression syntax tree node.

  • ctx – Query context containing file information.

Returns:

A tuple of (symbol, reference) if the call can be processed, None if the call expression format is not supported.

_extract_preceding_comment(node: Node, ctx: QueryContext) str | None[source]#

Extract the preceding comment/documentation for a C++ function definition.

Looks for comment nodes that appear immediately before the function definition. Handles both single-line (//) and multi-line (/* */) comment styles.

Parameters:
  • node – A function_definition syntax tree node.

  • ctx – Query context containing file information.

Returns:

The comment text as a string, or None if not present.

_clean_cpp_comment(raw_comment: str) str[source]#

Clean up a C++ comment by removing comment delimiters and normalizing whitespace.

Parameters:

raw_comment – The raw comment text including delimiters.

Returns:

The cleaned comment text.