Matt Robinson 4ea716837d
feat: add ability to extract extra metadata with regex (#763)
* first pass on regex metadata

* fix typing for regex metadata

* add dataclass back in

* add decorators

* fix tests

* update docs

* add tests for regex metadata

* add process metadata to tsv

* changelog and version

* docs typos

* consolidate to using a single kwarg

* fix test
2023-06-16 10:10:56 -04:00

42 lines
1.1 KiB
ReStructuredText

Unstructured Core Library
=========================
The ``unstructured`` library is designed to help preprocess structure unstructured text documents
for use in downstream machine learning tasks. Examples of documents that can be processes
using the ``unstructured`` library include PDFs, XML and HTML documents.
Library Documentation
---------------------
:doc:`installing`
Instructions on how to install the ``unstructured`` library on your system.
:doc:`getting_started`
Check out this section to learn about basic workflows in ``unstructured``.
:doc:`bricks`
Learn more about partitioning, cleaning, and staging bricks, including advanced usage patterns.
:doc:`metadata`
Learn more about how metadata is tracked in the ``unstructured`` library.
:doc:`examples`
Examples of other types of workflows within the ``unstructured`` package.
:doc:`integrations`
We make it easy for you to connect your output with other popular ML services.
.. Hidden TOCs
.. toctree::
:caption: Documentation
:maxdepth: 2
:hidden:
installing
getting_started
bricks
metadata
examples
integrations