mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-05 08:02:48 +00:00

This updates the docker image download url to pass through the scarf gateway, this allows anonymous tracking of downloads Related to: https://github.com/Unstructured-IO/unstructured#chart_with_upwards_trend-analytics Testing: docker pull downloads.unstructured.io/unstructured-io/unstructured:latest Result: Image should download
62 lines
2.4 KiB
ReStructuredText
62 lines
2.4 KiB
ReStructuredText
Docker Installation
|
|
=======================================
|
|
|
|
The instructions below guide you on how to use the `unstructured` library inside a Docker container.
|
|
|
|
Prerequisites
|
|
-------------
|
|
|
|
If you haven't installed Docker on your machine, you can find the installation guide `here <link_to_docker_installation>`_.
|
|
|
|
.. note::
|
|
We build multi-platform images to support both x86_64 and Apple silicon hardware. Using `docker pull` should download the appropriate image for your architecture. However, if needed, you can specify the platform with the `--platform` flag, e.g., `--platform linux/amd64`.
|
|
|
|
Pulling the Docker Image
|
|
-------------------------
|
|
|
|
We create Docker images for every push to the main branch. These images are tagged with the respective short commit hash (like `fbc7a69`) and the application version (e.g., `0.5.5-dev1`). The most recent image also receives the `latest` tag. To use these images, pull them from our repository:
|
|
|
|
.. code-block:: bash
|
|
|
|
docker pull downloads.unstructured.io/unstructured-io/unstructured:latest
|
|
|
|
Using the Docker Image
|
|
----------------------
|
|
|
|
After pulling the image, you can create and start a container from it:
|
|
|
|
.. code-block:: bash
|
|
|
|
# create the container
|
|
docker run -dt --name unstructured downloads.unstructured.io/unstructured-io/unstructured:latest
|
|
|
|
# start a bash shell inside the running Docker container
|
|
docker exec -it unstructured bash
|
|
|
|
Building Your Own Docker Image
|
|
------------------------------
|
|
|
|
You can also build your own Docker image. If you only plan to parse a single type of data, you can accelerate the build process by excluding certain packages or requirements needed for other data types. Refer to the `Dockerfile` to determine which lines are necessary for your requirements.
|
|
|
|
.. code-block:: bash
|
|
|
|
make docker-build
|
|
|
|
# start a bash shell inside the running Docker container
|
|
make docker-start-bash
|
|
|
|
Interacting with Python Inside the Container
|
|
--------------------------------------------
|
|
|
|
Once inside the running Docker container, you can directly test the library using Python's interactive mode:
|
|
|
|
.. code-block:: python
|
|
|
|
python3
|
|
|
|
>>> from unstructured.partition.pdf import partition_pdf
|
|
>>> elements = partition_pdf(filename="example-docs/layout-parser-paper-fast.pdf")
|
|
|
|
>>> from unstructured.partition.text import partition_text
|
|
>>> elements = partition_text(filename="example-docs/fake-text.txt")
|