From 6187dc09768df825920dca0e323005712aad05d2 Mon Sep 17 00:00:00 2001 From: John <43506685+Coniferish@users.noreply.github.com> Date: Fri, 15 Sep 2023 18:50:55 -0500 Subject: [PATCH] update links in integrations.rst (#1418) A number of the links in integrations.rst don't seem to lead to the intended section in the unstructured documentation. For example: ```See the `stage_for_weaviate `_ docs for details``` It seems this link should direct to here instead: https://unstructured-io.github.io/unstructured/bricks/staging.html#stage-for-weaviate --- docs/source/integrations.rst | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/source/integrations.rst b/docs/source/integrations.rst index c0abe8d72..6fe75fa09 100644 --- a/docs/source/integrations.rst +++ b/docs/source/integrations.rst @@ -8,7 +8,7 @@ which take a list of ``Element`` objects as input and return formatted dictionar ``Integration with Argilla`` ---------------------------- -You can convert a list of ``Text`` elements to an `Argilla `_ ``Dataset`` using the `stage_for_argilla `_ staging brick. Specify the type of dataset to be generated using the ``argilla_task`` parameter. Valid values are ``"text_classification"``, ``"token_classification"``, and ``"text2text"``. Follow the link for more details on usage. +You can convert a list of ``Text`` elements to an `Argilla `_ ``Dataset`` using the `stage_for_argilla `_ staging brick. Specify the type of dataset to be generated using the ``argilla_task`` parameter. Valid values are ``"text_classification"``, ``"token_classification"``, and ``"text2text"``. Follow the link for more details on usage. ``Integration with Baseplate`` @@ -16,31 +16,31 @@ You can convert a list of ``Text`` elements to an `Argilla `_ is a backend optimized for use with LLMs that has an easy to use spreadsheet interface. The ``unstructured`` library offers a staging brick to convert a list of ``Element`` objects into the `rows format `_ required by the Baseplate API. See the -`stage_for_baseplate `_ documentation for +`stage_for_baseplate `_ documentation for information on how to stage elements for ingestion into Baseplate. ``Integration with Datasaur`` ------------------------------ -You can format a list of ``Text`` elements as input to token based tasks in `Datasaur `_ using the `stage_for_datasaur `_ staging brick. You will obtain a list of dictionaries indexed by the keys ``"text"`` with the content of the element, and ``"entities"`` with an empty list. Follow the link to learn how to customise your entities and for more details on usage. +You can format a list of ``Text`` elements as input to token based tasks in `Datasaur `_ using the `stage_for_datasaur `_ staging brick. You will obtain a list of dictionaries indexed by the keys ``"text"`` with the content of the element, and ``"entities"`` with an empty list. Follow the link to learn how to customise your entities and for more details on usage. ``Integration with Hugging Face`` ---------------------------------- You can prepare ``Text`` elements for processing in Hugging Face `Transformers `_ -pipelines by splitting the elements into chunks that fit into the model's attention window using the `stage_for_transformers `_ staging brick. You can customise the transformation by defining +pipelines by splitting the elements into chunks that fit into the model's attention window using the `stage_for_transformers `_ staging brick. You can customise the transformation by defining the ``buffer`` and ``window_size``, the ``split_function`` and the ``chunk_separator``. if you need to operate on -text directly instead of ``unstructured`` ``Text`` objects, use the `chunk_by_attention_window `_ helper function. Follow the links for more details on usage. +text directly instead of ``unstructured`` ``Text`` objects, use the `chunk_by_attention_window `_ helper function. Follow the links for more details on usage. ``Integration with Labelbox`` ------------------------------ -You can format your outputs for use with `LabelBox `_ using the `stage_for_label_box `_ staging brick. LabelBox accepts cloud-hosted data and does not support importing text directly. With this integration you can stage the data files in the ``output_directory`` to be uploaded to a cloud storage service (such as S3 buckets) and get a config of type ``List[Dict[str, Any]]`` that can be written to a ``.json`` file and imported into LabelBox. Follow the link to see how to generate the ``config.json`` file that can be used with LabelBox, how to upload the staged data files to an S3 bucket, and for more details on usage. +You can format your outputs for use with `LabelBox `_ using the `stage_for_label_box `_ staging brick. LabelBox accepts cloud-hosted data and does not support importing text directly. With this integration you can stage the data files in the ``output_directory`` to be uploaded to a cloud storage service (such as S3 buckets) and get a config of type ``List[Dict[str, Any]]`` that can be written to a ``.json`` file and imported into LabelBox. Follow the link to see how to generate the ``config.json`` file that can be used with LabelBox, how to upload the staged data files to an S3 bucket, and for more details on usage. ``Integration with Label Studio`` ---------------------------------- -You can format your outputs for upload to `Label Studio `_ using the `stage_for_label_studio `_ staging brick. After running ``stage_for_label_studio``, you can write the results +You can format your outputs for upload to `Label Studio `_ using the `stage_for_label_studio `_ staging brick. After running ``stage_for_label_studio``, you can write the results to a JSON folder that is ready to be included in a new Label Studio project. You can also include pre-annotations and predictions as part of your upload. @@ -85,12 +85,12 @@ See `here `_ for more LlamaHub examples. ``Integration with Pandas`` ---------------------------- You can convert a list of ``Element`` objects to a Pandas dataframe with columns for -the text from each element and their types such as ``NarrativeText`` or ``Title`` using the `convert_to_dataframe `_ staging brick. Follow the link for more details on usage. +the text from each element and their types such as ``NarrativeText`` or ``Title`` using the `convert_to_dataframe `_ staging brick. Follow the link for more details on usage. ``Integration with Prodigy`` ----------------------------- -You can format your JSON or CSV outputs for use with `Prodigy `_ using the `stage_for_prodigy `_ and `stage_csv_for_prodigy `_ staging bricks. After running ``stage_for_prodigy`` | +You can format your JSON or CSV outputs for use with `Prodigy `_ using the `stage_for_prodigy `_ and `stage_csv_for_prodigy `_ staging bricks. After running ``stage_for_prodigy`` | ``stage_csv_for_prodigy``, you can write the results to a ``.json`` | ``.jsonl`` or a ``.csv`` file that is ready to be used with Prodigy. Follow the links for more details on usage. @@ -99,6 +99,6 @@ You can format your JSON or CSV outputs for use with `Prodigy `_ is an open-source vector database that allows you to store data objects and vector embeddings from a variety of ML models. Storing text and embeddings in a vector database such as Weaviate is a key component of the `emerging LLM tech stack `_. -See the `stage_for_weaviate `_ docs for details +See the `stage_for_weaviate `_ docs for details on how to upload ``unstructured`` outputs to Weaviate. An example notebook is also available `here `_.