44 lines
1.8 KiB
ReStructuredText
Raw Normal View History

Overview
---------
Introduction
^^^^^^^^^^^^^
The ``unstructured`` library aims to simplify and streamline the preprocessing of structured and unstructured documents for downstream tasks. And what that means is no matter where your data is
and no matter what format that data is in, Unstructured's toolkit will transform and preprocess that data into an easily digestable and usable format.
Product Offerings
^^^^^^^^^^^^^^^^^
- **Python Library**: Unstructured's open source software `(library) <https://github.com/Unstructured-IO/unstructured>`__.
- **Hosted API**: Easiest and most scalable way to process large documents in quantity `(library) <https://github.com/Unstructured-IO/unstructured-api>`__.
- **Enterprise Product**: In development with the hopes of launching late 2023.
Key Features
^^^^^^^^^^^^^
- **Integration**: Seamless integration capabilities with upstream and downstream applications.
- **Extensive File Support**: From classic DOC files to modern PDFs, the library supports a myriad of formats.
- **Scalability**: Designed to handle both small and large datasets, ensuring efficient processing regardless of size.
- **Customizability**: Easily extend and customize the library to fit specific requirements or unique use cases.
Common Use Cases
^^^^^^^^^^^^^^^^
- **Pretraining Models**
- **Fine-tuning Models**
- **Retrieval Augmented Generation (RAG)**
- **Traditional ETL**
Quickstart Tutorial
^^^^^^^^^^^^^^^^^^^^
If you're eager to dive in, head over to our `Getting Started <https://unstructured-io.github.io/unstructured/introduction/getting_started.html>`__ to get a hands-on introduction to the ``unstructured`` library. In a few minutes, you'll have a basic workflow set up and running!
For more detailed information about specific components or advanced features, explore the rest of the documentation.