mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-11 19:15:56 +00:00

* Updated Metadata page: add common and additional metadata fields by document types and connectors * Updated specific installation extra by document types and connectors * Added embedding brick page in Sphinx TOC * Fixed Sphinx warnings in new pages
23 lines
857 B
ReStructuredText
23 lines
857 B
ReStructuredText
Bricks
|
|
======
|
|
|
|
Bricks are functions that live in ``unstructured`` and are the primary public API for the library.
|
|
There are several types of bricks in ``unstructured``, corresponding to the different stages of document pre-processing: partitioning, cleaning, chunking and staging.
|
|
After reading this section, you should understand the following:
|
|
|
|
* How to partition a document into json or csv.
|
|
* How to remove unwanted content from document elements using cleaning bricks.
|
|
* How to extract content from a document using the extraction bricks.
|
|
* How to prepare data for downstream use cases using staging bricks
|
|
* How to chunk partitioned documents for use cases such as Retrieval Augmented Generation (RAG).
|
|
|
|
.. toctree::
|
|
:maxdepth: 1
|
|
|
|
bricks/partition
|
|
bricks/cleaning
|
|
bricks/extracting
|
|
bricks/staging
|
|
bricks/chunking
|
|
bricks/embedding
|