mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-19 07:02:38 +00:00

Updated: - Added back support document types for partitioning - Added more tabs for python code in the API page - Added a RAG section in Key Concepts - Added a Common Use case section in overview
44 lines
1.7 KiB
ReStructuredText
44 lines
1.7 KiB
ReStructuredText
Overview
|
|
---------
|
|
|
|
Introduction
|
|
^^^^^^^^^^^^^
|
|
|
|
The ``unstructured`` library aims to simplify and streamline the preprocessing of structured and unstructured documents for downstream tasks. And what that means is no matter where your data is
|
|
and no matter what format that data is in, Unstructured's toolkit will transform and preprocess that data into an easily digestable and usable format.
|
|
|
|
Product Offerings
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
- **Python Library**: Unstructured's open source software `(library) <https://github.com/Unstructured-IO/unstructured>`_.
|
|
|
|
- **Hosted API**: Easiest and most scalable way to process large documents in quantity `(library) <https://github.com/Unstructured-IO/unstructured-api>`_.
|
|
|
|
- **Enterprise Product**: In development with the hopes of launching late 2023.
|
|
|
|
Key Features
|
|
^^^^^^^^^^^^^
|
|
|
|
- **Integration**: Seamless integration capabilities with upstream and downstream applications.
|
|
|
|
- **Extensive File Support**: From classic DOC files to modern PDFs, the library supports a myriad of formats.
|
|
|
|
- **Scalability**: Designed to handle both small and large datasets, ensuring efficient processing regardless of size.
|
|
|
|
- **Customizability**: Easily extend and customize the library to fit specific requirements or unique use cases.
|
|
|
|
Common Use Cases
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
- **Pretraining Models**
|
|
- **Fine-tuning Models**
|
|
- **Retrieval Augmented Generation (RAG)**
|
|
- **Traditional ETL**
|
|
|
|
Quickstart Tutorial
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
If you're eager to dive in, head over to our :doc:`getting_started` to get a hands-on introduction to the ``unstructured`` library. In a few minutes, you'll have a basic workflow set up and running!
|
|
|
|
For more detailed information about specific components or advanced features, explore the rest of the documentation.
|