2023-05-01 18:17:52 -04:00
Unstructured Core Library
=========================
2022-06-29 14:35:19 -04:00
2024-01-25 12:31:28 -08:00
The `` unstructured `` library is designed to help preprocess and structure unstructured text documents for use in downstream machine learning tasks. Examples of documents that can be processed
2022-06-29 14:35:19 -04:00
using the `` unstructured `` library include PDFs, XML and HTML documents.
Library Documentation
---------------------
:doc: `installing`
2023-02-27 18:11:49 -05:00
Instructions on how to install the `` unstructured `` library on your system.
2023-07-14 14:28:57 -04:00
:doc: `api`
Access all the power of `` unstructured `` through the `` unstructured-api `` or learn to host it locally.
2024-03-06 11:16:08 -08:00
:doc: `platform`
Explore the enterprise-grade platform for enterprises and high-growth companies with large data volume looking to automatically retrieve, transform, and stage their data for LLMs.
2023-11-02 10:43:26 -04:00
:doc: `core`
Learn more about the core partitioning, chunking, cleaning, and staging functionality within the
Unstructured library.
2023-06-16 10:10:56 -04:00
2023-11-02 16:40:35 -04:00
:doc: `ingest/index`
2023-08-28 14:05:48 +02:00
Connect to your favorite data storage platforms for an effortless batch processing of your files.
2023-07-12 14:56:09 -04:00
2023-06-16 10:10:56 -04:00
:doc: `metadata`
Learn more about how metadata is tracked in the `` unstructured `` library.
2022-06-29 14:35:19 -04:00
:doc: `examples`
2023-02-27 18:11:49 -05:00
Examples of other types of workflows within the `` unstructured `` package.
2022-06-29 14:35:19 -04:00
2023-03-17 20:11:38 +01:00
:doc: `integrations`
We make it easy for you to connect your output with other popular ML services.
2022-06-29 14:35:19 -04:00
2023-09-09 18:54:01 -07:00
:doc: `best_practices`
Learn best practices to optimize document information extraction using `` unstructured `` library.
2022-06-29 14:35:19 -04:00
.. Hidden TOCs
.. toctree ::
2023-05-01 18:17:52 -04:00
:caption: Documentation
2022-06-29 14:35:19 -04:00
:maxdepth: 2
:hidden:
2023-08-21 10:27:32 -07:00
introduction
2022-06-29 14:35:19 -04:00
installing
2023-07-14 14:28:57 -04:00
api
2024-03-06 11:16:08 -08:00
platform
2023-11-02 10:43:26 -04:00
core
2023-11-02 16:40:35 -04:00
ingest/index
2023-06-16 10:10:56 -04:00
metadata
2022-06-29 14:35:19 -04:00
examples
2023-03-17 20:11:38 +01:00
integrations
2023-09-15 18:13:39 -04:00
best_practices