mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-04 07:27:34 +00:00

To test: > cd docs && make html Structures: * Getting Started with Platform (User Account Management) * Set Up workflow automation * Job Scheduling * Platform Source Connectors: * Azure Blob Storage, * Amazon S3 * Salesforce * Sharepoint * Google Cloud Storage * Google Drive * One Drive * Elasticsearch * SFTP Storage * Platform Destination Connectors: (i) * Amazon S3 * Azure Cognitive Search * Google Cloud Storage * Pinecone * Elasticsearch * Weaviate * MongoDB * AWS OpenSearch * Databricks --------- Co-authored-by: Matt Robinson <mrobinson@unstructured.io> Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
55 lines
1.6 KiB
ReStructuredText
55 lines
1.6 KiB
ReStructuredText
Unstructured Core Library
|
|
=========================
|
|
|
|
The ``unstructured`` library is designed to help preprocess and structure unstructured text documents for use in downstream machine learning tasks. Examples of documents that can be processed
|
|
using the ``unstructured`` library include PDFs, XML and HTML documents.
|
|
|
|
Library Documentation
|
|
---------------------
|
|
|
|
:doc:`installing`
|
|
Instructions on how to install the ``unstructured`` library on your system.
|
|
|
|
:doc:`api`
|
|
Access all the power of ``unstructured`` through the ``unstructured-api`` or learn to host it locally.
|
|
|
|
:doc:`platform`
|
|
Explore the enterprise-grade platform for enterprises and high-growth companies with large data volume looking to automatically retrieve, transform, and stage their data for LLMs.
|
|
|
|
:doc:`core`
|
|
Learn more about the core partitioning, chunking, cleaning, and staging functionality within the
|
|
Unstructured library.
|
|
|
|
:doc:`ingest/index`
|
|
Connect to your favorite data storage platforms for an effortless batch processing of your files.
|
|
|
|
:doc:`metadata`
|
|
Learn more about how metadata is tracked in the ``unstructured`` library.
|
|
|
|
:doc:`examples`
|
|
Examples of other types of workflows within the ``unstructured`` package.
|
|
|
|
:doc:`integrations`
|
|
We make it easy for you to connect your output with other popular ML services.
|
|
|
|
:doc:`best_practices`
|
|
Learn best practices to optimize document information extraction using ``unstructured`` library.
|
|
|
|
.. Hidden TOCs
|
|
|
|
.. toctree::
|
|
:caption: Documentation
|
|
:maxdepth: 2
|
|
:hidden:
|
|
|
|
introduction
|
|
installing
|
|
api
|
|
platform
|
|
core
|
|
ingest/index
|
|
metadata
|
|
examples
|
|
integrations
|
|
best_practices
|