mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-06-27 02:30:08 +00:00

Removed the dependencies contained in `test.txt`, `dev.txt`, and `constraints.txt` from the things that get installed in the docker image. In order to keep testing the image (running the tests), I added a step to the `docker-test` make target to install `test.txt` and `dev.txt`. Thus we presumably get a smaller image (probably not much smaller), reduce the dependency chain or our images, and have less exposure to vulnerabilities while still testing as robustly as before. Incidentally, I removed the `Dockerfile` for our ubuntu image, since it made reference to non-existent make targets, which tells me it's stale and wasn't being used. ### Review: - Reviewer should ensure the dev and test dependencies are not being installed in the docker image. One way to check is to check the logs in CI, and note, e.g. that [this](https://github.com/Unstructured-IO/unstructured/actions/runs/14112971425/job/39536304012#step:3:1700) is the first reference to `pytest` in the docker build and test logs, after the image build is completed. - Reviewer should ensure docker image is still being tested in CI and is passing.
41 lines
1.3 KiB
Docker
41 lines
1.3 KiB
Docker
# syntax=docker/dockerfile:experimental
|
|
FROM quay.io/unstructured-io/base-images:rocky9.2-9@sha256:73d8492452f086144d4b92b7931aa04719f085c74d16cae81e8826ef873729c9 as base
|
|
|
|
# NOTE(crag): NB_USER ARG for mybinder.org compat:
|
|
# https://mybinder.readthedocs.io/en/latest/tutorials/dockerfile.html
|
|
ARG NB_USER=notebook-user
|
|
ARG NB_UID=1000
|
|
ARG PIP_VERSION
|
|
|
|
# Set up environment
|
|
ENV HOME /home/${NB_USER}
|
|
ENV PYTHONPATH="${PYTHONPATH}:${HOME}"
|
|
ENV PATH="/home/usr/.local/bin:${PATH}"
|
|
|
|
RUN groupadd --gid ${NB_UID} ${NB_USER}
|
|
RUN useradd --uid ${NB_UID} --gid ${NB_UID} ${NB_USER}
|
|
WORKDIR ${HOME}
|
|
|
|
FROM base as deps
|
|
# Copy and install Unstructured
|
|
COPY requirements requirements
|
|
|
|
RUN python3.10 -m pip install pip==${PIP_VERSION} && \
|
|
dnf -y groupinstall "Development Tools" && \
|
|
find requirements/ -type f -name "*.txt" ! -name "test.txt" ! -name "dev.txt" ! -name "constraints.txt" -exec python3 -m pip install --no-cache -r '{}' ';' && \
|
|
dnf -y groupremove "Development Tools" && \
|
|
dnf clean all
|
|
|
|
RUN python3.10 -c "from unstructured.nlp.tokenize import download_nltk_packages; download_nltk_packages()"
|
|
|
|
FROM deps as code
|
|
|
|
USER ${NB_USER}
|
|
|
|
COPY example-docs example-docs
|
|
COPY unstructured unstructured
|
|
|
|
RUN python3.10 -c "from unstructured.partition.model_init import initialize; initialize()"
|
|
|
|
CMD ["/bin/bash"]
|