2024-11-21 17:23:04 +01:00
Docling is used by the [Data Prep Kit ](https://ibm.github.io/data-prep-kit/ ) open-source toolkit for preparing unstructured data for LLM application development ranging from laptop scale to datacenter scale.
2024-11-12 12:21:48 +01:00
2024-12-06 13:18:14 +01:00
## Components
### PDF ingestion to Parquet
2024-11-21 17:23:04 +01:00
- 💻 [PDF-to-Parquet GitHub ](https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/pdf2parquet )
2024-12-06 13:18:14 +01:00
- 📖 [PDF-to-Parquet docs ](https://ibm.github.io/data-prep-kit/transforms/language/pdf2parquet/python/ )
2024-11-12 12:21:48 +01:00
2024-12-06 13:18:14 +01:00
### Document chunking
2024-11-21 17:23:04 +01:00
- 💻 [Doc Chunking GitHub ](https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/doc_chunk )
2024-12-06 13:18:14 +01:00
- 📖 [Doc Chunking docs ](https://ibm.github.io/data-prep-kit/transforms/language/doc_chunk/python/ )