Welcome to pykomodo’s Documentation!
Github
Introduction
Welcome to pykomodo -a Python-based parallel file chunking system. Our goal is to convert or chunk massive codebases or mixed file dirs into bite-sized, LLM-ready chunks. You got semantic chunking for Python, PDF text ripping, ignore/unignore patterns, and multi-threaded speed. Whether you’re prepping a dataset for machine learning or just organizing chaos, pykomodo’s got your back.
Key Features: - Parallel processing - File filtering with custom patterns - Chunking styles: equal splits, size caps, semantic (AST-based), PDF-specific - LLM tweaks like metadata and deduping - Dry-run mode to test your setup
Contents
Table of Contents