Added AWS Marketplace docs and improved Azure Marketplace docs (#2248)
To test: > cd docs && make HTML Change logs: - Added AWS Marketplace documentation - Improved Azure Marketplace documentation - Networking section
@ -1,8 +1,11 @@
|
||||
## 0.11.6-dev5
|
||||
## 0.11.6-dev6
|
||||
|
||||
### Enhancements
|
||||
|
||||
* **Update the layout analysis script.** The previous script only supported annotating `final` elements. The updated script also supports annotating `inferred` and `extracted` elements.
|
||||
* **AWS Marketplace API documentation**: Added the user guide, including setting up VPC and CloudFormation, to deploy Unstructured API on AWS platform.
|
||||
* **Azure Marketplace API documentation**: Improved the user guide to deploy Azure Marketplace API by adding references to Azure documentation.
|
||||
* **Integration documentation**: Updated URLs for the `staging_for` bricks
|
||||
|
||||
### Features
|
||||
|
||||
|
||||
@ -28,5 +28,6 @@ NOTE: Currently, the pipeline is capable of recognizing the file type and choosi
|
||||
apis/api_sdks
|
||||
apis/usage_methods
|
||||
apis/azure_marketplace
|
||||
apis/aws_marketplace
|
||||
apis/api_parameters
|
||||
apis/validation_errors
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
Python and JavaScript SDK for Unstructured API
|
||||
===============================================
|
||||
Python and JavaScript SDK
|
||||
=========================
|
||||
|
||||
This documentation covers the usage of the Python and JavaScript SDKs for interacting with the Unstructured API.
|
||||
|
||||
@ -70,7 +70,7 @@ Usage
|
||||
|
||||
Below is a basic example of how to use the JavaScript SDK:
|
||||
|
||||
.. code-block:: bash
|
||||
.. code-block:: python
|
||||
|
||||
import { UnstructuredClient } from "unstructured-client";
|
||||
import { PartitionResponse } from "unstructured-client/dist/sdk/models/operations";
|
||||
|
||||
282
docs/source/apis/aws_marketplace.rst
Normal file
@ -0,0 +1,282 @@
|
||||
|
||||
AWS Marketplace Deployment Guide
|
||||
================================
|
||||
|
||||
This guide provides step-by-step instructions for deploying Unstructured API from AWS Marketplace.
|
||||
|
||||
Pre-Requirements
|
||||
----------------
|
||||
|
||||
1. **AWS Account**: Register at `AWS Registration Page <https://aws.amazon.com/>`_, if you don't have an AWS account.
|
||||
|
||||
2. **IAM Permissions**: Ensure permissions for ``CloudFormation``.
|
||||
|
||||
- Refer to this `AWS blog post <https://blog.awsfundamentals.com/aws-iam-roles-with-aws-cloudformation#heading-creating-iam-roles-with-aws-cloudformation>`_ to create IAM Roles with CloudFormation.
|
||||
|
||||
3. **SSH KeyPair**: Create or use an existing KeyPair for secure access.
|
||||
|
||||
- Follow the ``Create Key Pairs`` in the Amazon EC2 `User Guide <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/create-key-pairs.html>`_.
|
||||
|
||||
|
||||
Part I: Setting Up a Virtual Private Cloud (VPC)
|
||||
------------------------------------------------
|
||||
|
||||
*Note: If you have already configured a Virtual Private Cloud (VPC) for your organization that meets the requirements for deploying the Unstructured API, you may skip this part and proceed to the Part II. Ensure that your existing VPC setup includes the necessary subnets, Internet Gateway, and route tables as outlined in this guide.*
|
||||
|
||||
In Part 1, we will construct a resilient and secure infrastructure within AWS by setting up a Virtual Private Cloud (VPC). Our VPC will encompass a dual-tiered subnet model consisting of both **public** and **private** subnets across multiple Availability Zones (AZs).
|
||||
|
||||
We will establish the foundational network structure for deploying the Unstructured API by creating two public subnets and one private subnet within our VPC. The public subnets will host resources that require direct access to the internet, such as a load balancer, enabling them to communicate with external users. The private subnet is designed for resources that should not be directly accessible from the internet, like EC2 Compute Engine.
|
||||
|
||||
.. image:: imgs/AWS/Infrastructure_Diagram.png
|
||||
:align: center
|
||||
:alt: Infrastructure Diagram
|
||||
|
||||
**Step-by-Step Process:**
|
||||
|
||||
1. **Access VPC Dashboard**:
|
||||
|
||||
- In the AWS Management Console, navigate to the VPC service.
|
||||
- Click “Your VPCs” in the left navigation pane, then “Create VPC.”
|
||||
|
||||
2. **Create VPC**:
|
||||
|
||||
- Select ``VPC only``
|
||||
- Enter a ``Name tag`` for your VPC.
|
||||
- Specify the IPv4 CIDR block (e.g., 10.0.0.0/16).
|
||||
|
||||
- You may leave the IPv6 CIDR block, Tenancy, and Tags settings as default.
|
||||
- Click “Create VPC” button
|
||||
|
||||
.. image:: imgs/AWS/VPC_Step2.png
|
||||
:align: center
|
||||
:alt: create vpc
|
||||
|
||||
3. **Create Subnets**:
|
||||
|
||||
- After creating the VPC, click “Subnets” in the left navigation pane.
|
||||
- Click “Create subnet” and select the VPC you just created from the dropdown menu.
|
||||
- For the first public subnet:
|
||||
|
||||
- Enter a ``Name tag``.
|
||||
- Select an ``Availability Zone``.
|
||||
- Specify the IPv4 CIDR block (e.g., 10.0.0.0/16).
|
||||
- Specify the IPv4 subnet CIDR block (e.g., 10.0.1.0/24).
|
||||
- You may the Tags settings as default.
|
||||
- Click ``Add new subnet``.
|
||||
- Repeat the process for the second public subnet with a different Availability Zone and CIDR block (e.g., 10.0.2.0/24).
|
||||
|
||||
- *Note: Each subnet must reside entirely within one Availability Zone and cannot span zones*.
|
||||
- Ref: AWS documentation on `Subnet basics <https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html#subnet-basics>`_.
|
||||
- For the private subnet:
|
||||
|
||||
- Follow the same steps, but choose a different Availability Zone and IPv4 subnet CIDR block (e.g., 10.0.3.0/24).
|
||||
|
||||
- Click ``Create subnet``.
|
||||
|
||||
.. image:: imgs/AWS/VPC_Step3.png
|
||||
:align: center
|
||||
:alt: create subnet
|
||||
|
||||
4. **Create Internet Gateway (for Public Subnets)**:
|
||||
|
||||
- Go to “Internet Gateways” in the VPC dashboard.
|
||||
- Click “Create internet gateway,” enter a name, and create.
|
||||
|
||||
- Note: we will attach the newly created internet gateway to your VPC in Step 6 - Edit Route.
|
||||
|
||||
.. image:: imgs/AWS/VPC_Step4.png
|
||||
:align: center
|
||||
:alt: create internet gateway
|
||||
|
||||
5. **Set Up Route Tables (for Public Subnets)**: *AWS automatically creates a default Route Table in Step 3 above. To tailor our network architecture, we will create a new Route Table specifically for our public subnets, which will include a route to the Internet Gateway from Step 4 above.*
|
||||
|
||||
- Click "Route tables" in the left navigation pane.
|
||||
- Click “Create route table” in the dashboard.
|
||||
- Enter a ``Name``.
|
||||
- Select the ``VPC`` from Step 2 above.
|
||||
- Click ``Create route table``
|
||||
|
||||
.. image:: imgs/AWS/VPC_Step5.png
|
||||
:align: center
|
||||
:alt: create route table
|
||||
|
||||
6. **Associate Public Subnets to the Route Table and Internet Gateway**:
|
||||
|
||||
- Click on “Your VPCs” in the left navigation pane.
|
||||
- Select the VPC that you just created in Step 2.
|
||||
- Connect the **public subnets** to the **route table** from Step 5.
|
||||
|
||||
- Click the 'Subnets' page in the left navigation pane.
|
||||
- Select the public subnet from Step 3.
|
||||
- Click ``Actions`` button on the top right-hand corner
|
||||
- Select ``Edit route table association`` from the Actions dropdown menu
|
||||
- On the ``Edit route table association`` page, select the route table designed for public subnets from Step 5 and save the changes.
|
||||
- Repeat the process for the second public subnets.
|
||||
|
||||
- Now, we'll ensure that the public subnets can access the internet by connecting the Route table to Internet Gateway
|
||||
|
||||
- Click the 'Route tables' page in the left navigation pane.
|
||||
- Select the ``route table`` that you created in Step 5.
|
||||
- Click ``Actions`` button on the top right-hand corner
|
||||
- Select ``Edit routes`` from the Actions dropdown menu
|
||||
- Choose 'Add route', and in the destination box, enter **0.0.0.0/0** which represents all IP addresses.
|
||||
- In the target box, select the ``Internet Gateway`` you've configured in Step 4.
|
||||
- Click ``Save changes`` to establish the route, granting internet access to the public subnets.
|
||||
|
||||
- For the **private subnet**, use the main route table or create a new one without a route to the internet gateway.
|
||||
|
||||
.. image:: imgs/AWS/VPC_Step6.png
|
||||
:align: center
|
||||
:alt: connect public subnet to route table
|
||||
|
||||
.. image:: imgs/AWS/VPC_Step7.png
|
||||
:align: center
|
||||
:alt: edit routes
|
||||
|
||||
7. **Inspect VPC Resource Map**:
|
||||
|
||||
You can check the configurations from the Resource Maps on the VPC Details dashboard.
|
||||
|
||||
.. image:: imgs/AWS/VPC_Step8.png
|
||||
:align: center
|
||||
:alt: VPC Resource Maps
|
||||
|
||||
|
||||
Part II: Deploying Unstructured API from AWS Marketplace
|
||||
--------------------------------------------------------
|
||||
|
||||
8. **Visit the Unstructured API page on AWS Marketplace**
|
||||
|
||||
- Link: `Unstructured API Marketplace <http://aws.amazon.com/marketplace/pp/prodview-fuvslrofyuato>`_.
|
||||
- Click ``Continue to subscribe``
|
||||
- Review Terms and Conditions
|
||||
- Click ``Continue to Configuration``
|
||||
|
||||
.. image:: imgs/AWS/Marketplace_Step8.png
|
||||
:align: center
|
||||
:alt: Unstructured API on AWS Marketplace
|
||||
|
||||
9. **Configure the CloudFormation**
|
||||
|
||||
- Select ``CloudFormation Template`` from the Fulfillment option dropdown menu.
|
||||
- Use the default ``Unstructured API`` template and software version.
|
||||
- Select the ``Region``
|
||||
|
||||
- *Note: It is important to select the same region where you set up the VPC in Part 1.*
|
||||
- Click ``Continue to Launch`` button.
|
||||
- Select ``Launch CloudFormation`` from Choose Action dropdown menu.
|
||||
- Click ``Launch`` button.
|
||||
|
||||
|
||||
.. image:: imgs/AWS/Marketplace_Step9.png
|
||||
:align: center
|
||||
:alt: CloudFormation Configuration
|
||||
|
||||
|
||||
10. **Create Stack on CloudFormation**
|
||||
|
||||
The Launch button will redirect to ``Create stack`` workflow in the CloudFormation.
|
||||
|
||||
**Step 1: Create stack**
|
||||
|
||||
- Select the ``Template is ready``
|
||||
- Use the default template source from ``Amazon S3 URL``
|
||||
- Click ``Next`` button.
|
||||
|
||||
.. image:: imgs/AWS/Marketplace_Step10a.png
|
||||
:align: center
|
||||
:alt: Create Stack
|
||||
|
||||
|
||||
**Step 2: Specify stack details**
|
||||
|
||||
- Provide ``stack name``
|
||||
- In the **Parameters** section, provide the ``KeyName`` - see the Pre-Requirements, if you haven't created an EC2 Key Pair.
|
||||
- Specify ``LoadBalancerScheme`` to **internet-facing**
|
||||
- Set the ``SSHLocation`` to **0.0.0.0/0**, only if you allow public access on the Internet.
|
||||
|
||||
- **Note**: It is generally recommended to limit SSH access to a specific IP range for enhanced security. This can be done by setting the ``SSHLocation`` to the IP address or range associated with your organization. Please consult your IT department or VPN vendor to obtain the correct IP information for these settings.
|
||||
- AWS provides ``AWS Client VPN``, which is a managed client-based VPN service that enables secure access AWS resources and resources in your on-premises network. For more information, please refer to `Getting started with AWS Client VPN <https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/cvpn-getting-started.html>`_.
|
||||
- Select the ``Subnets`` and ``VPC`` from the Part 1 above.
|
||||
- You can use the default values for other Parameter fields
|
||||
- Click ``Next`` button.
|
||||
|
||||
.. image:: imgs/AWS/Marketplace_Step10b.png
|
||||
:align: center
|
||||
:alt: Specify stack details
|
||||
|
||||
**Step 3: Configure stack options**
|
||||
|
||||
- Specify the stack options or use default values.
|
||||
- Click ``Next`` button.
|
||||
|
||||
.. image:: imgs/AWS/Marketplace_Step10c.png
|
||||
:align: center
|
||||
:alt: Specify stack options
|
||||
|
||||
**Step 4: Review**
|
||||
|
||||
- Review the Stack settings.
|
||||
- Click ``Submit`` button.
|
||||
|
||||
.. image:: imgs/AWS/Marketplace_Step10d.png
|
||||
:align: center
|
||||
:alt: Review stack
|
||||
|
||||
|
||||
11. **Get the Unstructured API Endpoint**
|
||||
|
||||
- Check the status of the CloudFormation stack.
|
||||
|
||||
- A successful deployment will show ``CREATE_COMPLETE`` status.
|
||||
- Click ``Resources`` tab and click the ``ApplicationLoadBalancer``.
|
||||
- You will be redirected to ``EC2 Load Balancer`` page and click the Load Balancer created by the Cloud Formation from the previous step.
|
||||
- On the Load Balance detail page, copy the ``DNS Name``, shown as ``A Record`` and suffix ``elb.amazonaws.com``.
|
||||
|
||||
- Note: You will use this ``DNS Name`` to replace the ``<api_url>`` for the next steps, i.e., Healthcheck and Data Processing.
|
||||
|
||||
.. image:: imgs/AWS/Marketplace_Step11.png
|
||||
:align: center
|
||||
:alt: Unstructured API Endpoint
|
||||
|
||||
Healthcheck
|
||||
-----------
|
||||
|
||||
Perform a health check using the curl command:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
curl https://<api_url>/healthcheck
|
||||
|
||||
.. image:: imgs/AWS/healthcheck.png
|
||||
:align: center
|
||||
:alt: Healthcheck
|
||||
|
||||
Data Processing
|
||||
---------------
|
||||
|
||||
Data processing can be performed using curl commands.
|
||||
|
||||
- Note: you will need to add the suffix to the endpoint: **/general/v0/general**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
curl -X 'POST' 'https://<api_url>/general/v0/general'
|
||||
-H 'accept: application/json'
|
||||
-H 'Content-Type: multipart/form-data'
|
||||
-F 'files=@sample-docs/family-day.eml'
|
||||
| jq -C . | less -R
|
||||
|
||||
.. image:: imgs/AWS/endpoint.png
|
||||
:align: center
|
||||
:alt: Data Processing Endpoint
|
||||
|
||||
Getting Started with Unstructured
|
||||
---------------------------------
|
||||
|
||||
Explore examples in the Unstructured GitHub repository: `Unstructured GitHub <https://github.com/Unstructured-IO/unstructured>`_.
|
||||
|
||||
Support
|
||||
-------
|
||||
|
||||
For support inquiries, contact: `support@unstructured.io <mailto:support@unstructured.io>`_
|
||||
@ -14,7 +14,7 @@ This guide provides step-by-step instructions for deploying a service on Azure u
|
||||
- Navigate to the Azure Marketplace using `this URL <https://azuremarketplace.microsoft.com/en-us/marketplace/apps/unstructured1691024866136.customer_api_v1?tab=Overview/>`__.
|
||||
|
||||
|
||||
.. image:: imgs/Azure_Step2.png
|
||||
.. image:: imgs/Azure/Azure_Step2.png
|
||||
:align: center
|
||||
:alt: Azure Marketplace
|
||||
|
||||
@ -26,7 +26,7 @@ This guide provides step-by-step instructions for deploying a service on Azure u
|
||||
- Click **Create** button.
|
||||
|
||||
|
||||
.. image:: imgs/Azure_Step3.png
|
||||
.. image:: imgs/Azure/Azure_Step3.png
|
||||
:align: center
|
||||
:alt: Deployment Process
|
||||
|
||||
@ -40,7 +40,7 @@ On the **Create a virtual machine** page, go to **Basics** tab and follow the st
|
||||
- Select **Subscription** and **Resource group** from dropdown menu.
|
||||
- Or, you can also ``Create New`` resource group.
|
||||
|
||||
.. image:: imgs/Azure_Step4a.png
|
||||
.. image:: imgs/Azure/Azure_Step4a.png
|
||||
:align: center
|
||||
:alt: project details
|
||||
|
||||
@ -48,9 +48,9 @@ On the **Create a virtual machine** page, go to **Basics** tab and follow the st
|
||||
- Provide a name in the **Virtual machine name** field.
|
||||
- Select a **Region** from the dropdown menu.
|
||||
- **Image**: Select ``Unstructured Customer Hosted API Hourly - x64 Gen2`` (*default*)
|
||||
- **Size**: Select VM size from dropdown menu.
|
||||
- **Size**: Select VM size from dropdown menu. Refer to this page for `Azure VM comparisons <https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/>`_
|
||||
|
||||
.. image:: imgs/Azure_Step4b.png
|
||||
.. image:: imgs/Azure/Azure_Step4b.png
|
||||
:align: center
|
||||
:alt: instance details
|
||||
|
||||
@ -58,7 +58,7 @@ On the **Create a virtual machine** page, go to **Basics** tab and follow the st
|
||||
- **Authentication type**: Select ``Password`` or ``SSH public key``.
|
||||
- Enter the ``credentials``.
|
||||
|
||||
.. image:: imgs/Azure_Step4c.png
|
||||
.. image:: imgs/Azure/Azure_Step4c.png
|
||||
:align: center
|
||||
:alt: administrator account
|
||||
|
||||
@ -66,21 +66,22 @@ On the **Create a virtual machine** page, go to **Basics** tab and follow the st
|
||||
5. Set Up Load Balancer
|
||||
-----------------------
|
||||
|
||||
On the **Create a virtual machine** page, go to **Networking** tab and follow the steps below.
|
||||
Before you click ``Review + create`` button, go to **Networking** tab and follow the steps below.
|
||||
|
||||
- Networking interface (required fields)
|
||||
- **Virtual network**: Select from dropdown menu or create new
|
||||
- **Subnet**: Select from dropdown menu
|
||||
- **Configure network security group**: Select from dropdown menu or create new
|
||||
- **Virtual network**: Click ``Create new`` link or select a ``Virtual network`` from dropdown menu, if you have created one. Refer to `Quickstart: Use the Azure portal to create a virtual network <https://learn.microsoft.com/en-us/azure/virtual-network/quick-create-portal>`_.
|
||||
- **Subnet**: Click ``Manage subnet configuration`` link or select a subnet from dropdown menu, if you have created one. Refer to `Add, change, or delete a virtual network subnet <https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-manage-subnet?tabs=azure-portal>`_
|
||||
- **Configure network security group**: Click ``Create new`` link or select a security group from dropdown menu, if you have created one. Refer to `Create, change, or delete a network security group <https://learn.microsoft.com/en-us/azure/virtual-network/manage-network-security-group?tabs=network-security-group-portal>`_.
|
||||
|
||||
- Load balancing
|
||||
- **Load balancing option**: Select ``Azure load balancer``
|
||||
- **Select a load balance**: Select from dropdown menu or create new
|
||||
- **Select a load balancer**: If you have created a load balancer, select from dropdown menu, or click ``Create a load balancer` and fill out the following fields in the pop-up window.
|
||||
- Enter **Load balancer name**
|
||||
- **Type**: Select ``Public`` or ``Internal``
|
||||
- **Protococl**: Select ``TCP`` or ``UDP``
|
||||
- **Port** and **Backend Port**: Set to ``port 80``
|
||||
|
||||
.. image:: imgs/Azure_Step5.png
|
||||
.. image:: imgs/Azure/Azure_Step5.png
|
||||
:align: center
|
||||
:alt: load balancer
|
||||
|
||||
@ -91,7 +92,7 @@ On the **Create a virtual machine** page, go to **Networking** tab and follow th
|
||||
- Wait for validation.
|
||||
- Click **Create**.
|
||||
|
||||
.. image:: imgs/Azure_Step6.png
|
||||
.. image:: imgs/Azure/Azure_Step6.png
|
||||
:align: center
|
||||
:alt: deployment
|
||||
|
||||
@ -102,7 +103,7 @@ On the **Create a virtual machine** page, go to **Networking** tab and follow th
|
||||
- Retrieve the **Load balancer public IP address**
|
||||
- The deployed endpoint is **http://<load-balancer-public-IP-address>/general/v0/general**
|
||||
|
||||
.. image:: imgs/Azure_Step7.png
|
||||
.. image:: imgs/Azure/Azure_Step7.png
|
||||
:align: center
|
||||
:alt: retrieve public ip
|
||||
|
||||
@ -114,8 +115,12 @@ On the **Create a virtual machine** page, go to **Networking** tab and follow th
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
curl -q -X POST http://<you-IP-address>/general/v0/general -H 'accept: application/json' -H 'Content-Type: multipart/form-data' -F files=@english-and-korean.png -o /tmp/english-and-korean.png.json
|
||||
curl -q -X POST http://<you-IP-address>/general/v0/general
|
||||
-H 'accept: application/json'
|
||||
-H 'Content-Type: multipart/form-data'
|
||||
-F files=@<<FILENAME>>
|
||||
-o <<PATH/OUTPUT>>.json
|
||||
|
||||
.. image:: imgs/Azure_Step8.png
|
||||
.. image:: imgs/Azure/Azure_Step8.png
|
||||
:align: center
|
||||
:alt: testing
|
||||
BIN
docs/source/apis/imgs/AWS/Infrastructure_Diagram.png
Normal file
|
After Width: | Height: | Size: 645 KiB |
BIN
docs/source/apis/imgs/AWS/Marketplace_Step10a.png
Normal file
|
After Width: | Height: | Size: 278 KiB |
BIN
docs/source/apis/imgs/AWS/Marketplace_Step10b.png
Normal file
|
After Width: | Height: | Size: 287 KiB |
BIN
docs/source/apis/imgs/AWS/Marketplace_Step10c.png
Normal file
|
After Width: | Height: | Size: 368 KiB |
BIN
docs/source/apis/imgs/AWS/Marketplace_Step10d.png
Normal file
|
After Width: | Height: | Size: 263 KiB |
BIN
docs/source/apis/imgs/AWS/Marketplace_Step11.png
Normal file
|
After Width: | Height: | Size: 398 KiB |
BIN
docs/source/apis/imgs/AWS/Marketplace_Step8.png
Normal file
|
After Width: | Height: | Size: 479 KiB |
BIN
docs/source/apis/imgs/AWS/Marketplace_Step9.png
Normal file
|
After Width: | Height: | Size: 300 KiB |
BIN
docs/source/apis/imgs/AWS/VPC_Step2.png
Normal file
|
After Width: | Height: | Size: 234 KiB |
BIN
docs/source/apis/imgs/AWS/VPC_Step3.png
Normal file
|
After Width: | Height: | Size: 218 KiB |
BIN
docs/source/apis/imgs/AWS/VPC_Step4.png
Normal file
|
After Width: | Height: | Size: 150 KiB |
BIN
docs/source/apis/imgs/AWS/VPC_Step5.png
Normal file
|
After Width: | Height: | Size: 156 KiB |
BIN
docs/source/apis/imgs/AWS/VPC_Step6.png
Normal file
|
After Width: | Height: | Size: 106 KiB |
BIN
docs/source/apis/imgs/AWS/VPC_Step7.png
Normal file
|
After Width: | Height: | Size: 121 KiB |
BIN
docs/source/apis/imgs/AWS/VPC_Step8.png
Normal file
|
After Width: | Height: | Size: 164 KiB |
BIN
docs/source/apis/imgs/AWS/endpoint.png
Normal file
|
After Width: | Height: | Size: 49 KiB |
BIN
docs/source/apis/imgs/AWS/healthcheck.png
Normal file
|
After Width: | Height: | Size: 39 KiB |
|
Before Width: | Height: | Size: 618 KiB After Width: | Height: | Size: 618 KiB |
|
Before Width: | Height: | Size: 460 KiB After Width: | Height: | Size: 460 KiB |
|
Before Width: | Height: | Size: 155 KiB After Width: | Height: | Size: 155 KiB |
|
Before Width: | Height: | Size: 169 KiB After Width: | Height: | Size: 169 KiB |
|
Before Width: | Height: | Size: 100 KiB After Width: | Height: | Size: 100 KiB |
|
Before Width: | Height: | Size: 614 KiB After Width: | Height: | Size: 614 KiB |
|
Before Width: | Height: | Size: 498 KiB After Width: | Height: | Size: 498 KiB |
|
Before Width: | Height: | Size: 588 KiB After Width: | Height: | Size: 588 KiB |
|
Before Width: | Height: | Size: 103 KiB After Width: | Height: | Size: 103 KiB |
@ -64,8 +64,7 @@ Method 2: Local Deployment Using ``unstructured-api`` Library
|
||||
- Parallel processing for PDFs with environment variables.
|
||||
- Server load management with UNSTRUCTURED_MEMORY_FREE_MINIMUM_MB.
|
||||
|
||||
- **Using Docker Image**:
|
||||
Docker commands for pulling and running the container.
|
||||
- **Using Docker Image**: Docker commands for pulling and running the container.
|
||||
|
||||
- **More Details**: Check out the `unstructured-api GitHub Repository <https://github.com/Unstructured-IO/unstructured-api>`_ for further information.
|
||||
|
||||
|
||||
@ -22,7 +22,8 @@ Install the `unstructured` package with S3 support.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
!pip install "unstructured[s3]"
|
||||
pip install "unstructured[s3]"
|
||||
|
||||
|
||||
Step 2: Import Libraries
|
||||
========================
|
||||
|
||||
@ -2,13 +2,13 @@ Integrations
|
||||
=============
|
||||
Integrate your model development pipeline with your favorite machine learning frameworks and libraries,
|
||||
and prepare your data for ingestion into downstream systems. Most of our integrations come in the form of
|
||||
`staging functions <https://unstructured-io.github.io/unstructured/functions.html#staging>`_,
|
||||
`staging functions <https://unstructured-io.github.io/unstructured/core/staging.html>`_,
|
||||
which take a list of ``Element`` objects as input and return formatted dictionaries as output.
|
||||
|
||||
|
||||
``Integration with Argilla``
|
||||
----------------------------
|
||||
You can convert a list of ``Text`` elements to an `Argilla <https://www.argilla.io/>`_ ``Dataset`` using the `stage_for_argilla <https://unstructured-io.github.io/unstructured/functions/staging.html#stage-for-argilla>`_ staging function. Specify the type of dataset to be generated using the ``argilla_task`` parameter. Valid values are ``"text_classification"``, ``"token_classification"``, and ``"text2text"``. Follow the link for more details on usage.
|
||||
You can convert a list of ``Text`` elements to an `Argilla <https://www.argilla.io/>`_ ``Dataset`` using the `stage_for_argilla <https://unstructured-io.github.io/unstructured/core/staging.html#stage-for-argilla>`_ staging function. Specify the type of dataset to be generated using the ``argilla_task`` parameter. Valid values are ``"text_classification"``, ``"token_classification"``, and ``"text2text"``. Follow the link for more details on usage.
|
||||
|
||||
|
||||
``Integration with Baseplate``
|
||||
@ -16,26 +16,26 @@ You can convert a list of ``Text`` elements to an `Argilla <https://www.argilla.
|
||||
`Baseplate <https://docs.baseplate.ai/introduction>`_ is a backend optimized for use with LLMs that has an easy to use spreadsheet
|
||||
interface. The ``unstructured`` library offers a staging function to convert a list of ``Element`` objects into the
|
||||
`rows format <https://docs.baseplate.ai/api-reference/documents/overview>`_ required by the Baseplate API. See the
|
||||
`stage_for_baseplate <https://unstructured-io.github.io/unstructured/functions/staging.html#stage-for-baseplate>`_ documentation for
|
||||
`stage_for_baseplate <https://unstructured-io.github.io/unstructured/core/staging.html#stage-for-baseplate>`_ documentation for
|
||||
information on how to stage elements for ingestion into Baseplate.
|
||||
|
||||
|
||||
``Integration with Datasaur``
|
||||
------------------------------
|
||||
You can format a list of ``Text`` elements as input to token based tasks in `Datasaur <https://datasaur.ai/>`_ using the `stage_for_datasaur <https://unstructured-io.github.io/unstructured/functions/staging.html#stage-for-datasaur>`_ staging function. You will obtain a list of dictionaries indexed by the keys ``"text"`` with the content of the element, and ``"entities"`` with an empty list. Follow the link to learn how to customise your entities and for more details on usage.
|
||||
You can format a list of ``Text`` elements as input to token based tasks in `Datasaur <https://datasaur.ai/>`_ using the `stage_for_datasaur <https://unstructured-io.github.io/unstructured/core/staging.html#stage-for-datasaur>`_ staging function. You will obtain a list of dictionaries indexed by the keys ``"text"`` with the content of the element, and ``"entities"`` with an empty list. Follow the link to learn how to customise your entities and for more details on usage.
|
||||
|
||||
|
||||
``Integration with Hugging Face``
|
||||
----------------------------------
|
||||
You can prepare ``Text`` elements for processing in Hugging Face `Transformers <https://huggingface.co/docs/transformers/index>`_
|
||||
pipelines by splitting the elements into chunks that fit into the model's attention window using the `stage_for_transformers <https://unstructured-io.github.io/unstructured/functions/staging.html#stage-for-transformers>`_ staging function. You can customise the transformation by defining
|
||||
pipelines by splitting the elements into chunks that fit into the model's attention window using the `stage_for_transformers <https://unstructured-io.github.io/unstructured/core/staging.html#stage-for-transformers>`_ staging function. You can customise the transformation by defining
|
||||
the ``buffer`` and ``window_size``, the ``split_function`` and the ``chunk_separator``. if you need to operate on
|
||||
text directly instead of ``unstructured`` ``Text`` objects, use the `chunk_by_attention_window <https://unstructured-io.github.io/unstructured/functions/staging.html#stage-for-transformers>`_ helper function. Follow the links for more details on usage.
|
||||
|
||||
|
||||
``Integration with Labelbox``
|
||||
------------------------------
|
||||
You can format your outputs for use with `LabelBox <https://labelbox.com/>`_ using the `stage_for_label_box <https://unstructured-io.github.io/unstructured/functions/staging.html#stage-for-label-box>`_ staging function. LabelBox accepts cloud-hosted data and does not support importing text directly. With this integration you can stage the data files in the ``output_directory`` to be uploaded to a cloud storage service (such as S3 buckets) and get a config of type ``List[Dict[str, Any]]`` that can be written to a ``.json`` file and imported into LabelBox. Follow the link to see how to generate the ``config.json`` file that can be used with LabelBox, how to upload the staged data files to an S3 bucket, and for more details on usage.
|
||||
You can format your outputs for use with `LabelBox <https://labelbox.com/>`_ using the `stage_for_label_box <https://unstructured-io.github.io/unstructured/core/staging.html#stage-for-label-box>`_ staging function. LabelBox accepts cloud-hosted data and does not support importing text directly. With this integration you can stage the data files in the ``output_directory`` to be uploaded to a cloud storage service (such as S3 buckets) and get a config of type ``List[Dict[str, Any]]`` that can be written to a ``.json`` file and imported into LabelBox. Follow the link to see how to generate the ``config.json`` file that can be used with LabelBox, how to upload the staged data files to an S3 bucket, and for more details on usage.
|
||||
|
||||
|
||||
``Integration with Label Studio``
|
||||
|
||||
@ -1 +1 @@
|
||||
__version__ = "0.11.6-dev5" # pragma: no cover
|
||||
__version__ = "0.11.6-dev6" # pragma: no cover
|
||||
|
||||