mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-06 08:31:46 +00:00

Fixed issue #1437 - resolved the Warning errors when building sphinx with `make html`. test: 1. `cd docs` folder and `rm -rf build` 2. `pip install -r requirements.txt` 3. run `make html`
44 lines
1.8 KiB
ReStructuredText
44 lines
1.8 KiB
ReStructuredText
Overview
|
|
---------
|
|
|
|
Introduction
|
|
^^^^^^^^^^^^^
|
|
|
|
The ``unstructured`` library aims to simplify and streamline the preprocessing of structured and unstructured documents for downstream tasks. And what that means is no matter where your data is
|
|
and no matter what format that data is in, Unstructured's toolkit will transform and preprocess that data into an easily digestable and usable format.
|
|
|
|
Product Offerings
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
- **Python Library**: Unstructured's open source software `(library) <https://github.com/Unstructured-IO/unstructured>`__.
|
|
|
|
- **Hosted API**: Easiest and most scalable way to process large documents in quantity `(library) <https://github.com/Unstructured-IO/unstructured-api>`__.
|
|
|
|
- **Enterprise Product**: In development with the hopes of launching late 2023.
|
|
|
|
Key Features
|
|
^^^^^^^^^^^^^
|
|
|
|
- **Integration**: Seamless integration capabilities with upstream and downstream applications.
|
|
|
|
- **Extensive File Support**: From classic DOC files to modern PDFs, the library supports a myriad of formats.
|
|
|
|
- **Scalability**: Designed to handle both small and large datasets, ensuring efficient processing regardless of size.
|
|
|
|
- **Customizability**: Easily extend and customize the library to fit specific requirements or unique use cases.
|
|
|
|
Common Use Cases
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
- **Pretraining Models**
|
|
- **Fine-tuning Models**
|
|
- **Retrieval Augmented Generation (RAG)**
|
|
- **Traditional ETL**
|
|
|
|
Quickstart Tutorial
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
If you're eager to dive in, head over to our `Getting Started <https://unstructured-io.github.io/unstructured/introduction/getting_started.html>`__ to get a hands-on introduction to the ``unstructured`` library. In a few minutes, you'll have a basic workflow set up and running!
|
|
|
|
For more detailed information about specific components or advanced features, explore the rest of the documentation.
|