2022-06-29 14:35:19 -04:00
Bricks
======
2023-02-27 18:11:49 -05:00
Bricks are functions that live in `` unstructured `` and are the primary public API for the library.
2023-08-21 10:27:32 -07:00
There are several types of bricks in `` unstructured `` , corresponding to the different stages of document pre-processing: partitioning, cleaning, chunking and staging.
2023-02-27 18:11:49 -05:00
After reading this section, you should understand the following:
2023-08-21 10:27:32 -07:00
* How to partition a document into json or csv.
2023-02-27 18:11:49 -05:00
* How to remove unwanted content from document elements using cleaning bricks.
2023-08-21 10:27:32 -07:00
* How to extract content from a document using the extraction bricks.
2023-02-27 18:11:49 -05:00
* How to prepare data for downstream use cases using staging bricks
2023-08-29 12:04:57 -04:00
* How to chunk partitioned documents for use cases such as Retrieval Augmented Generation (RAG).
2023-02-27 18:11:49 -05:00
2023-08-21 10:27:32 -07:00
.. toctree ::
:maxdepth: 1
2022-06-29 14:35:19 -04:00
2023-08-21 10:27:32 -07:00
bricks/partition
bricks/cleaning
bricks/extracting
bricks/staging
2023-08-29 12:04:57 -04:00
bricks/chunking
2023-10-04 18:25:41 -07:00
bricks/embedding