API Reference

PyKomodo provides several modules for document processing, chunking, and management.

Core Module

class pykomodo.core.PriorityRule[source]

Bases: object

__init__(pattern, score)[source]
class pykomodo.core.PyCConfig[source]

Bases: object

__init__()[source]
add_ignore_pattern(pattern)[source]
Parameters:

pattern (str)

Return type:

None

add_priority_rule(pattern, score)[source]
Parameters:
  • pattern (str)

  • score (int)

Return type:

None

add_unignore_pattern(pattern)[source]
Parameters:

pattern (str)

Return type:

None

calculate_priority(path)[source]
Parameters:

path (str)

Return type:

int

count_tokens(text)[source]
Parameters:

text (str)

Return type:

int

is_binary_file(path)[source]
Parameters:

path (str)

Return type:

bool

make_c_string(text)[source]
Parameters:

text (str | None)

Return type:

str

read_file_contents(path)[source]
Parameters:

path (str)

Return type:

str

should_ignore(path)[source]
Parameters:

path (str)

Return type:

bool

Example usage:

from pykomodo.core import TextProcessor

processor = TextProcessor()
result = processor.process_text(my_document)

Multi-Directory Chunker

class pykomodo.multi_dirs_chunker.ChunkWriterInterface[source]

Bases: object

__init__(chunker)[source]
write_chunk(content_bytes, chunk_num)[source]
class pykomodo.multi_dirs_chunker.ParallelChunker[source]

Bases: object

__init__(equal_chunks=None, max_chunk_size=None, output_dir='chunks', user_ignore=None, user_unignore=None, binary_extensions=None, priority_rules=None, num_threads=4, dry_run=False, semantic_chunking=False, file_type=None, verbose=False)[source]
Parameters:
  • equal_chunks (int | None)

  • max_chunk_size (int | None)

  • output_dir (str)

  • user_ignore (List[str] | None)

  • user_unignore (List[str] | None)

  • binary_extensions (List[str] | None)

  • priority_rules (List[Tuple[str, int]] | None)

  • num_threads (int)

  • dry_run (bool)

  • semantic_chunking (bool)

  • file_type (str | None)

  • verbose (bool)

Return type:

None

calculate_priority(path)[source]
close()[source]
is_absolute_pattern(pattern)[source]
is_binary_file(path)[source]
pdf_chunking(path, idx)[source]
process_directories(dirs)[source]
Parameters:

dirs (List[str])

Return type:

None

process_directory(directory)[source]
process_file(file_path, custom_chunk_size=None, force_process=False)[source]
Parameters:
  • file_path (str)

  • custom_chunk_size (int | None)

  • force_process (bool)

Return type:

None

should_ignore_file(path)[source]
DIR_IGNORE_NAMES = ['venv', '.venv', 'env', 'node_modules', '.git', '.svn', '.hg', '__pycache__', '.pytest_cache', '.tox', '.eggs', 'build', 'dist']
class pykomodo.multi_dirs_chunker.PriorityRule[source]

Bases: object

__init__(pattern, score)[source]

Example usage:

from pykomodo.multi_dirs_chunker import MultiDirChunker

chunker = MultiDirChunker()
chunks = chunker.chunk_directories(['path/to/dir1', 'path/to/dir2'], chunk_size=1000)

Enhanced Chunker

class pykomodo.enhanced_chunker.EnhancedParallelChunker[source]

Bases: ParallelChunker

__init__(equal_chunks=None, max_chunk_size=None, output_dir='chunks', user_ignore=None, user_unignore=None, binary_extensions=None, priority_rules=None, num_threads=4, extract_metadata=True, add_summaries=True, remove_redundancy=True, context_window=4096, min_relevance_score=0.3)[source]
Parameters:
  • equal_chunks (int | None)

  • max_chunk_size (int | None)

  • output_dir (str)

  • user_ignore (List[str] | None)

  • user_unignore (List[str] | None)

  • binary_extensions (List[str] | None)

  • priority_rules (List[Tuple[str, int]] | None)

  • num_threads (int)

  • extract_metadata (bool)

  • add_summaries (bool)

  • remove_redundancy (bool)

  • context_window (int)

  • min_relevance_score (float)

Return type:

None

Command Line Interface

pykomodo.command_line.main()[source]
pykomodo.command_line.run_server()[source]

Configuration

Example configuration:

from pykomodo.pykomodo_config import PyKomodoConfig

config = PyKomodoConfig()
config.load_config('config.yaml')