36 lines
1.6 KiB
ReStructuredText
Raw Normal View History

Overview
---------
Introduction
^^^^^^^^^^^^^
The ``unstructured`` library aims to simplify and streamline the preprocessing of structured and unstructured documents for downstream tasks. And what that means is no matter where your data is
and no matter what format that data is in, Unstructured's toolkit will transform and preprocess that data into an easily digestable and usable format.
Key Features
^^^^^^^^^^^^^
- **Integration**: Seamless integration capabilities with upstream and downstream applications.
- **Extensive File Support**: From classic DOC files to modern PDFs, the library supports a myriad of formats.
- **Scalability**: Designed to handle both small and large datasets, ensuring efficient processing regardless of size.
- **Customizability**: Easily extend and customize the library to fit specific requirements or unique use cases.
Key Concepts
^^^^^^^^^^^^^
- **Connectors**: Interfaces that enable the library to interact with different data sources and sinks, like cloud storage or databases.
- **Bricks**: Modular units of the library that allow users to partition, clean, and stage data efficiently.
- **Metadata**: Data about data. In ``unstructured``, metadata helps in keeping track of the source, type, and other essential attributes of the data.
Quickstart Tutorial
^^^^^^^^^^^^^^^^^^^^
If you're eager to dive in, head over to our :doc:`getting_started` to get a hands-on introduction to the ``unstructured`` library. In a few minutes, you'll have a basic workflow set up and running!
For more detailed information about specific components or advanced features, explore the rest of the documentation.