Utilities#

Helper modules and utilities used throughout the code-index library.

Custom JSON#

Custom JSON serialization utilities for code indexer data structures.

This module provides enhanced JSON encoding and decoding capabilities for handling complex data structures used in the code indexer, including dataclasses, Path objects, and custom type registration for serialization.

The module supports:

Automatic dataclass serialization/deserialization
Path object handling (automatic conversion to/from strings)
Type registration system for custom classes
Strict/non-strict deserialization modes

Classes:

EnhancedJSONEncoder: Custom JSON encoder for handling non-standard types.

Functions:

register_json_type: Decorator for registering dataclasses for JSON serialization. custom_json_decoder: Custom JSON decoder for reconstructing objects. dump_index_to_json: Utility function for saving index data to JSON files. load_index_from_json: Utility function for loading index data from JSON files.

class code_index.utils.custom_json.EnhancedJSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]#

Bases: JSONEncoder

Enhanced JSON encoder for handling non-standard Python types.

This encoder extends the standard JSONEncoder to automatically handle:

pathlib.Path objects (converted to strings)
dataclass objects (converted to dictionaries with type information)

The encoder preserves type information by adding a special “__class__” field to serialized dataclass objects, enabling proper reconstruction during deserialization.

default(o)[source]#

Serialize objects that are not natively JSON serializable.

Parameters:: o – The object to serialize.
Returns:: A JSON-serializable representation of the object.
Raises:: TypeError – If the object type is not supported by this encoder.

code_index.utils.custom_json.JSON_TYPE_REGISTRY: dict[str, Type[Any]] = {}#

Global registry mapping class names to their types for JSON deserialization.

This dictionary is automatically populated when classes are decorated with @register_json_type and is used by custom_json_decoder to reconstruct the correct object types during JSON deserialization.

The registry maps class names (strings) to their corresponding type objects, enabling the decoder to instantiate the proper classes when encountering serialized dataclass objects in JSON data.

code_index.utils.custom_json.register_json_type(cls: Type[T]) → Type[T][source]#

This decorator registers a dataclass in the global type registry, enabling automatic serialization and deserialization through the custom JSON utilities. Only dataclasses can be registered.

Parameters:: cls – The dataclass type to register. Must be a dataclass.
Returns:: The same class (unmodified), allowing use as a decorator.
Raises:: ValueError – If the provided class is not a dataclass.

Example

>>> @register_json_type
... @dataclass
... class MyData:
...     value: int

>>> # MyData is now registered and can be serialized/deserialized

code_index.utils.custom_json.custom_json_decoder(dct: Dict, strict=False) → object[source]#

Custom JSON decoder for reconstructing objects from dictionaries.

This decoder handles the reconstruction of registered dataclass objects and automatic Path object conversion during JSON deserialization.

Parameters:

dct – Dictionary containing serialized object data.
strict – If True, raises exceptions when encountering unregistered classes. If False, returns the dictionary unchanged for unregistered types.

Returns:

The reconstructed object if type information is available and registered, otherwise the original dictionary.

Raises:

ValueError – If strict=True and an unregistered class is encountered.

Example

>>> data = {"value": 42, "__class__": "MyData"}
>>> obj = custom_json_decoder(data)
>>> isinstance(obj, MyData)
True

code_index.utils.custom_json.dump_index_to_json(index: dict, output_path: Path)[source]#

Save index data to a JSON file with enhanced encoding.

This function serializes index data to JSON format using the EnhancedJSONEncoder to handle complex data types like dataclasses and Path objects.

Parameters:

index – The index data dictionary to serialize.
output_path – Path where the JSON file should be written.

Raises:

IOError – If the file cannot be written due to permissions or disk issues.

Example

>>> index_data = {"functions": [some_function_data]}
>>> dump_index_to_json(index_data, Path("index.json"))

code_index.utils.custom_json.load_index_from_json(input_path: Path, strict=False)[source]#

Load index data from a JSON file with custom decoding.

This function deserializes index data from JSON format using the custom decoder to reconstruct dataclass objects and handle Path conversion.

Parameters:

input_path – Path to the JSON file to load.
strict – If True, raises exceptions for unregistered classes during deserialization. If False, leaves unregistered objects as dictionaries.

Returns:

The deserialized index data with proper object types reconstructed.

Raises:

IOError – If the file cannot be read.
ValueError – If strict=True and unregistered classes are encountered.
json.JSONDecodeError – If the file contains invalid JSON.

Example

>>> data = load_index_from_json(Path("index.json"))
>>> # Returns properly typed objects based on registry

Logger#

Logging configuration module for the code indexer.

This module provides a centralized logging setup using loguru, with configuration support through environment variables. The logger is automatically configured when the module is imported and provides colored console output with detailed formatting.

Environment Variables:

CODE_INDEX_LOG_LEVEL: Sets the logging level (default: INFO).: Valid values: DEBUG, INFO, WARNING, ERROR, CRITICAL.

code_index.utils.logger.logger#: The configured loguru logger instance used throughout the application. This logger provides structured logging with colored output, backtrace support, and detailed context information including time, level, module, function, and line number.

Example

>>> from code_index.utils.logger import logger
>>> logger.info("Processing started")
>>> logger.debug("Detailed debug information")
>>> logger.error("An error occurred")

Test Utilities#

Testing utilities for code indexer data comparison and validation.

This module provides specialized utilities for testing code indexer functionality, particularly for comparing complex data structures like IndexData objects that contain nested dataclasses, lists, and Path objects.

The utilities handle normalization of data structures to enable reliable comparison by sorting lists, normalizing paths, and converting dataclasses to comparable formats while preserving semantic meaning.

Functions:: normalize_path: Standardize path strings for comparison. normalize_dataclass_for_comparison: Convert dataclass objects to comparable format. normalize_index_data_for_comparison: Normalize IndexData for testing comparison. compare_index_data: Compare two IndexData objects with detailed diff reporting. assert_index_data_equal: Assertion function for IndexData equality testing.

code_index.utils.test.normalize_path(path: Path | str) → str[source]#

Normalize path strings for reliable cross-platform comparison.

Converts path objects to resolved absolute path strings to ensure consistent comparison regardless of the original path format or current working directory.

Parameters:: path – Path object or string to normalize.
Returns:: Normalized absolute path string.

Example

>>> normalize_path("./src/../src/main.py")
"/absolute/path/to/src/main.py"

code_index.utils.test.normalize_dataclass_for_comparison(obj: Any) → Any[source]#

Convert dataclass objects to comparable format with recursive processing.

This function recursively processes complex data structures containing dataclasses, dictionaries, lists, and other types to create a normalized representation suitable for equality comparison in tests.

The normalization process:

Converts dataclasses to dictionaries
Recursively processes nested structures
Sorts lists and tuples when possible (for order-independent comparison)
Normalizes Path objects to strings

Parameters:: obj – The object to normalize (can be any type).
Returns:: Normalized representation of the object suitable for comparison.

Example

>>> @dataclass
... class TestData:
...     items: list[str]
>>> obj = TestData(items=["b", "a"])
>>> normalized = normalize_dataclass_for_comparison(obj)
>>> normalized["items"]
["a", "b"]  # Sorted for consistent comparison

code_index.utils.test.normalize_index_data_for_comparison(data: IndexData) → dict[str, Any][source]#

Normalize IndexData objects for reliable test comparison.

Converts IndexData to a standardized dictionary format with consistent ordering of nested structures. This enables reliable equality testing by eliminating order dependencies that don’t affect semantic meaning.

The normalization process:

Converts the entire IndexData to a dictionary
Sorts data entries by symbol name and type
Sorts definitions by file path and line number
Sorts references by file path and line number
Sorts function calls within definitions

Parameters:: data – IndexData object to normalize.
Returns:: Normalized dictionary representation suitable for comparison.

Example

>>> index_data = IndexData(type="simple", data=[...])
>>> normalized = normalize_index_data_for_comparison(index_data)
>>> # All nested lists are now consistently sorted

code_index.utils.test.compare_index_data(data1: IndexData, data2: IndexData) → Tuple[bool, list[str]][source]#

Compare two IndexData objects for test equality with detailed difference reporting.

Performs a deep comparison of two IndexData objects after normalization, providing detailed information about any differences found. This is useful for debugging test failures and understanding how data structures differ.

Parameters:

data1 – First IndexData object to compare.
data2 – Second IndexData object to compare.

Returns:

bool: True if objects are equal, False otherwise
list[str]: List of difference descriptions (empty if equal)

Return type:

A tuple containing

Example

>>> data1 = IndexData(...)
>>> data2 = IndexData(...)
>>> is_equal, differences = compare_index_data(data1, data2)
>>> if not is_equal:
...     for diff in differences:
...         print(f"Difference: {diff}")

code_index.utils.test.assert_index_data_equal(actual: IndexData, expected: IndexData, msg: str = 'IndexData objects are not equal') → None[source]#

Assert that two IndexData objects are equal in testing context.

This function provides a detailed assertion for IndexData equality, showing specific differences when the assertion fails. It’s designed to be used in unit tests where detailed failure information is needed.

Parameters:

actual – The actual IndexData object (from test execution).
expected – The expected IndexData object (reference/baseline).
msg – Custom message to include in assertion failure.

Raises:

AssertionError – If the objects are not equal, with detailed difference information in the error message.

Example

>>> def test_indexing():
...     actual_data = index_some_code()
...     expected_data = load_expected_data()
...     assert_index_data_equal(
...         actual_data, expected_data, "Code indexing produced unexpected results"
...     )

Utilities

Contents

Utilities#

Custom JSON#

Logger#

Test Utilities#