mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-06 08:31:46 +00:00
36 lines
1.6 KiB
ReStructuredText
36 lines
1.6 KiB
ReStructuredText
![]() |
Overview
|
||
|
---------
|
||
|
|
||
|
Introduction
|
||
|
^^^^^^^^^^^^^
|
||
|
|
||
|
The ``unstructured`` library aims to simplify and streamline the preprocessing of structured and unstructured documents for downstream tasks. And what that means is no matter where your data is
|
||
|
and no matter what format that data is in, Unstructured's toolkit will transform and preprocess that data into an easily digestable and usable format.
|
||
|
|
||
|
Key Features
|
||
|
^^^^^^^^^^^^^
|
||
|
|
||
|
- **Integration**: Seamless integration capabilities with upstream and downstream applications.
|
||
|
|
||
|
- **Extensive File Support**: From classic DOC files to modern PDFs, the library supports a myriad of formats.
|
||
|
|
||
|
- **Scalability**: Designed to handle both small and large datasets, ensuring efficient processing regardless of size.
|
||
|
|
||
|
- **Customizability**: Easily extend and customize the library to fit specific requirements or unique use cases.
|
||
|
|
||
|
Key Concepts
|
||
|
^^^^^^^^^^^^^
|
||
|
|
||
|
- **Connectors**: Interfaces that enable the library to interact with different data sources and sinks, like cloud storage or databases.
|
||
|
|
||
|
- **Bricks**: Modular units of the library that allow users to partition, clean, and stage data efficiently.
|
||
|
|
||
|
- **Metadata**: Data about data. In ``unstructured``, metadata helps in keeping track of the source, type, and other essential attributes of the data.
|
||
|
|
||
|
Quickstart Tutorial
|
||
|
^^^^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
If you're eager to dive in, head over to our :doc:`getting_started` to get a hands-on introduction to the ``unstructured`` library. In a few minutes, you'll have a basic workflow set up and running!
|
||
|
|
||
|
For more detailed information about specific components or advanced features, explore the rest of the documentation.
|