Jack Retterer 95b6295307
Jack/update documentation (#1190)
Updated:
- Added back support document types for partitioning
- Added more tabs for python code in the API page
- Added a RAG section in Key Concepts
- Added a Common Use case section in overview
2023-09-04 16:15:50 +00:00

44 lines
1.7 KiB
ReStructuredText

Overview
---------
Introduction
^^^^^^^^^^^^^
The ``unstructured`` library aims to simplify and streamline the preprocessing of structured and unstructured documents for downstream tasks. And what that means is no matter where your data is
and no matter what format that data is in, Unstructured's toolkit will transform and preprocess that data into an easily digestable and usable format.
Product Offerings
^^^^^^^^^^^^^^^^^
- **Python Library**: Unstructured's open source software `(library) <https://github.com/Unstructured-IO/unstructured>`_.
- **Hosted API**: Easiest and most scalable way to process large documents in quantity `(library) <https://github.com/Unstructured-IO/unstructured-api>`_.
- **Enterprise Product**: In development with the hopes of launching late 2023.
Key Features
^^^^^^^^^^^^^
- **Integration**: Seamless integration capabilities with upstream and downstream applications.
- **Extensive File Support**: From classic DOC files to modern PDFs, the library supports a myriad of formats.
- **Scalability**: Designed to handle both small and large datasets, ensuring efficient processing regardless of size.
- **Customizability**: Easily extend and customize the library to fit specific requirements or unique use cases.
Common Use Cases
^^^^^^^^^^^^^^^^
- **Pretraining Models**
- **Fine-tuning Models**
- **Retrieval Augmented Generation (RAG)**
- **Traditional ETL**
Quickstart Tutorial
^^^^^^^^^^^^^^^^^^^^
If you're eager to dive in, head over to our :doc:`getting_started` to get a hands-on introduction to the ``unstructured`` library. In a few minutes, you'll have a basic workflow set up and running!
For more detailed information about specific components or advanced features, explore the rest of the documentation.