mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-12-28 07:33:36 +00:00
update links in integrations.rst (#1418)
A number of the links in integrations.rst don't seem to lead to the intended section in the unstructured documentation. For example: ```See the `stage_for_weaviate <https://unstructured-io.github.io/unstructured/bricks.html#stage-for-weaviate>`_ docs for details``` It seems this link should direct to here instead: https://unstructured-io.github.io/unstructured/bricks/staging.html#stage-for-weaviate
This commit is contained in:
parent
333558494e
commit
6187dc0976
@ -8,7 +8,7 @@ which take a list of ``Element`` objects as input and return formatted dictionar
|
||||
|
||||
``Integration with Argilla``
|
||||
----------------------------
|
||||
You can convert a list of ``Text`` elements to an `Argilla <https://www.argilla.io/>`_ ``Dataset`` using the `stage_for_argilla <https://unstructured-io.github.io/unstructured/bricks.html#stage-for-argilla>`_ staging brick. Specify the type of dataset to be generated using the ``argilla_task`` parameter. Valid values are ``"text_classification"``, ``"token_classification"``, and ``"text2text"``. Follow the link for more details on usage.
|
||||
You can convert a list of ``Text`` elements to an `Argilla <https://www.argilla.io/>`_ ``Dataset`` using the `stage_for_argilla <https://unstructured-io.github.io/unstructured/bricks/staging.html#stage-for-argilla>`_ staging brick. Specify the type of dataset to be generated using the ``argilla_task`` parameter. Valid values are ``"text_classification"``, ``"token_classification"``, and ``"text2text"``. Follow the link for more details on usage.
|
||||
|
||||
|
||||
``Integration with Baseplate``
|
||||
@ -16,31 +16,31 @@ You can convert a list of ``Text`` elements to an `Argilla <https://www.argilla.
|
||||
`Baseplate <https://docs.baseplate.ai/introduction>`_ is a backend optimized for use with LLMs that has an easy to use spreadsheet
|
||||
interface. The ``unstructured`` library offers a staging brick to convert a list of ``Element`` objects into the
|
||||
`rows format <https://docs.baseplate.ai/api-reference/documents/overview>`_ required by the Baseplate API. See the
|
||||
`stage_for_baseplate <https://unstructured-io.github.io/unstructured/bricks.html#stage-for-baseplate>`_ documentation for
|
||||
`stage_for_baseplate <https://unstructured-io.github.io/unstructured/bricks/staging.html#stage-for-baseplate>`_ documentation for
|
||||
information on how to stage elements for ingestion into Baseplate.
|
||||
|
||||
|
||||
``Integration with Datasaur``
|
||||
------------------------------
|
||||
You can format a list of ``Text`` elements as input to token based tasks in `Datasaur <https://datasaur.ai/>`_ using the `stage_for_datasaur <https://unstructured-io.github.io/unstructured/bricks.html#stage-for-datasaur>`_ staging brick. You will obtain a list of dictionaries indexed by the keys ``"text"`` with the content of the element, and ``"entities"`` with an empty list. Follow the link to learn how to customise your entities and for more details on usage.
|
||||
You can format a list of ``Text`` elements as input to token based tasks in `Datasaur <https://datasaur.ai/>`_ using the `stage_for_datasaur <https://unstructured-io.github.io/unstructured/bricks/staging.html#stage-for-datasaur>`_ staging brick. You will obtain a list of dictionaries indexed by the keys ``"text"`` with the content of the element, and ``"entities"`` with an empty list. Follow the link to learn how to customise your entities and for more details on usage.
|
||||
|
||||
|
||||
``Integration with Hugging Face``
|
||||
----------------------------------
|
||||
You can prepare ``Text`` elements for processing in Hugging Face `Transformers <https://huggingface.co/docs/transformers/index>`_
|
||||
pipelines by splitting the elements into chunks that fit into the model's attention window using the `stage_for_transformers <https://unstructured-io.github.io/unstructured/bricks.html#stage-for-transformers>`_ staging brick. You can customise the transformation by defining
|
||||
pipelines by splitting the elements into chunks that fit into the model's attention window using the `stage_for_transformers <https://unstructured-io.github.io/unstructured/bricks/staging.html#stage-for-transformers>`_ staging brick. You can customise the transformation by defining
|
||||
the ``buffer`` and ``window_size``, the ``split_function`` and the ``chunk_separator``. if you need to operate on
|
||||
text directly instead of ``unstructured`` ``Text`` objects, use the `chunk_by_attention_window <https://unstructured-io.github.io/unstructured/bricks.html#stage-for-transformers>`_ helper function. Follow the links for more details on usage.
|
||||
text directly instead of ``unstructured`` ``Text`` objects, use the `chunk_by_attention_window <https://unstructured-io.github.io/unstructured/bricks/staging.html#stage-for-transformers>`_ helper function. Follow the links for more details on usage.
|
||||
|
||||
|
||||
``Integration with Labelbox``
|
||||
------------------------------
|
||||
You can format your outputs for use with `LabelBox <https://labelbox.com/>`_ using the `stage_for_label_box <https://unstructured-io.github.io/unstructured/bricks.html#stage-for-label-box>`_ staging brick. LabelBox accepts cloud-hosted data and does not support importing text directly. With this integration you can stage the data files in the ``output_directory`` to be uploaded to a cloud storage service (such as S3 buckets) and get a config of type ``List[Dict[str, Any]]`` that can be written to a ``.json`` file and imported into LabelBox. Follow the link to see how to generate the ``config.json`` file that can be used with LabelBox, how to upload the staged data files to an S3 bucket, and for more details on usage.
|
||||
You can format your outputs for use with `LabelBox <https://labelbox.com/>`_ using the `stage_for_label_box <https://unstructured-io.github.io/unstructured/bricks/staging.html#stage-for-label-box>`_ staging brick. LabelBox accepts cloud-hosted data and does not support importing text directly. With this integration you can stage the data files in the ``output_directory`` to be uploaded to a cloud storage service (such as S3 buckets) and get a config of type ``List[Dict[str, Any]]`` that can be written to a ``.json`` file and imported into LabelBox. Follow the link to see how to generate the ``config.json`` file that can be used with LabelBox, how to upload the staged data files to an S3 bucket, and for more details on usage.
|
||||
|
||||
|
||||
``Integration with Label Studio``
|
||||
----------------------------------
|
||||
You can format your outputs for upload to `Label Studio <https://labelstud.io/>`_ using the `stage_for_label_studio <https://unstructured-io.github.io/unstructured/bricks.html#stage-for-label-studio>`_ staging brick. After running ``stage_for_label_studio``, you can write the results
|
||||
You can format your outputs for upload to `Label Studio <https://labelstud.io/>`_ using the `stage_for_label_studio <https://unstructured-io.github.io/unstructured/bricks/staging.html#stage-for-label-studio>`_ staging brick. After running ``stage_for_label_studio``, you can write the results
|
||||
to a JSON folder that is ready to be included in a new Label Studio project. You can also include pre-annotations and predictions
|
||||
as part of your upload.
|
||||
|
||||
@ -85,12 +85,12 @@ See `here <https://llamahub.ai/>`_ for more LlamaHub examples.
|
||||
``Integration with Pandas``
|
||||
----------------------------
|
||||
You can convert a list of ``Element`` objects to a Pandas dataframe with columns for
|
||||
the text from each element and their types such as ``NarrativeText`` or ``Title`` using the `convert_to_dataframe <https://unstructured-io.github.io/unstructured/bricks.html#convert-to-dataframe>`_ staging brick. Follow the link for more details on usage.
|
||||
the text from each element and their types such as ``NarrativeText`` or ``Title`` using the `convert_to_dataframe <https://unstructured-io.github.io/unstructured/bricks/staging.html#convert-to-dataframe>`_ staging brick. Follow the link for more details on usage.
|
||||
|
||||
|
||||
``Integration with Prodigy``
|
||||
-----------------------------
|
||||
You can format your JSON or CSV outputs for use with `Prodigy <https://prodi.gy/docs/api-loaders>`_ using the `stage_for_prodigy <https://unstructured-io.github.io/unstructured/bricks.html#stage-for-prodigy>`_ and `stage_csv_for_prodigy <https://unstructured-io.github.io/unstructured/bricks.html#stage-csv-for-prodigy>`_ staging bricks. After running ``stage_for_prodigy`` |
|
||||
You can format your JSON or CSV outputs for use with `Prodigy <https://prodi.gy/docs/api-loaders>`_ using the `stage_for_prodigy <https://unstructured-io.github.io/unstructured/bricks/staging.html#stage-for-prodigy>`_ and `stage_csv_for_prodigy <https://unstructured-io.github.io/unstructured/bricks/staging.html#stage-csv-for-prodigy>`_ staging bricks. After running ``stage_for_prodigy`` |
|
||||
``stage_csv_for_prodigy``, you can write the results to a ``.json`` | ``.jsonl`` or a ``.csv`` file that is ready to be used with Prodigy. Follow the links for more details on usage.
|
||||
|
||||
|
||||
@ -99,6 +99,6 @@ You can format your JSON or CSV outputs for use with `Prodigy <https://prodi.gy/
|
||||
`Weaviate <https://weaviate.io/>`_ is an open-source vector database that allows you to store data objects and vector embeddings
|
||||
from a variety of ML models. Storing text and embeddings in a vector database such as Weaviate is a key component of the
|
||||
`emerging LLM tech stack <https://medium.com/@unstructured-io/llms-and-the-emerging-ml-tech-stack-bdb189c8be5c>`_.
|
||||
See the `stage_for_weaviate <https://unstructured-io.github.io/unstructured/bricks.html#stage-for-weaviate>`_ docs for details
|
||||
See the `stage_for_weaviate <https://unstructured-io.github.io/unstructured/bricks/staging.html#stage-for-weaviate>`_ docs for details
|
||||
on how to upload ``unstructured`` outputs to Weaviate. An example notebook is also available
|
||||
`here <https://github.com/Unstructured-IO/unstructured/tree/main/examples/weaviate>`_.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user