2025-05-11 20:38:25 +02:00
Docling is used by the [Data Prep Kit ](https://data-prep-kit.github.io/data-prep-kit/ ) open-source toolkit for preparing unstructured data for LLM application development ranging from laptop scale to datacenter scale.
2024-11-12 12:21:48 +01:00
2024-12-06 13:18:14 +01:00
## Components
### PDF ingestion to Parquet
2025-05-11 20:38:25 +02:00
- 💻 [Docling2Parquet source ](https://github.com/data-prep-kit/data-prep-kit/tree/dev/transforms/language/docling2parquet )
- 📖 [Docling2Parquet docs ](https://data-prep-kit.github.io/data-prep-kit/transforms/language/pdf2parquet/ )
2024-11-12 12:21:48 +01:00
2024-12-06 13:18:14 +01:00
### Document chunking
2025-05-11 20:38:25 +02:00
- 💻 [Doc Chunking source ](https://github.com/data-prep-kit/data-prep-kit/tree/dev/transforms/language/doc_chunk )
- 📖 [Doc Chunking docs ](https://data-prep-kit.github.io/data-prep-kit/transforms/language/doc_chunk/ )