API Reference
PyKomodo provides several modules for document processing, chunking, and management.
Core Module
Example usage:
from pykomodo.core import TextProcessor
processor = TextProcessor()
result = processor.process_text(my_document)
Multi-Directory Chunker
- class pykomodo.multi_dirs_chunker.ParallelChunker[source]
Bases:
object- __init__(equal_chunks=None, max_chunk_size=None, output_dir='chunks', user_ignore=None, user_unignore=None, binary_extensions=None, priority_rules=None, num_threads=4, dry_run=False, semantic_chunking=False, file_type=None, verbose=False)[source]
- Parameters:
equal_chunks (int | None)
max_chunk_size (int | None)
output_dir (str)
user_ignore (List[str] | None)
user_unignore (List[str] | None)
binary_extensions (List[str] | None)
priority_rules (List[Tuple[str, int]] | None)
num_threads (int)
dry_run (bool)
semantic_chunking (bool)
file_type (str | None)
verbose (bool)
- Return type:
None
- process_file(file_path, custom_chunk_size=None, force_process=False)[source]
- Parameters:
file_path (str)
custom_chunk_size (int | None)
force_process (bool)
- Return type:
None
- DIR_IGNORE_NAMES = ['venv', '.venv', 'env', 'node_modules', '.git', '.svn', '.hg', '__pycache__', '.pytest_cache', '.tox', '.eggs', 'build', 'dist']
Example usage:
from pykomodo.multi_dirs_chunker import MultiDirChunker
chunker = MultiDirChunker()
chunks = chunker.chunk_directories(['path/to/dir1', 'path/to/dir2'], chunk_size=1000)
Enhanced Chunker
- class pykomodo.enhanced_chunker.EnhancedParallelChunker[source]
Bases:
ParallelChunker- __init__(equal_chunks=None, max_chunk_size=None, output_dir='chunks', user_ignore=None, user_unignore=None, binary_extensions=None, priority_rules=None, num_threads=4, extract_metadata=True, add_summaries=True, remove_redundancy=True, context_window=4096, min_relevance_score=0.3)[source]
- Parameters:
equal_chunks (int | None)
max_chunk_size (int | None)
output_dir (str)
user_ignore (List[str] | None)
user_unignore (List[str] | None)
binary_extensions (List[str] | None)
priority_rules (List[Tuple[str, int]] | None)
num_threads (int)
extract_metadata (bool)
add_summaries (bool)
remove_redundancy (bool)
context_window (int)
min_relevance_score (float)
- Return type:
None
Command Line Interface
Configuration
Example configuration:
from pykomodo.pykomodo_config import PyKomodoConfig
config = PyKomodoConfig()
config.load_config('config.yaml')