mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-11-25 06:39:58 +00:00
Removed the dependencies contained in `test.txt`, `dev.txt`, and `constraints.txt` from the things that get installed in the docker image. In order to keep testing the image (running the tests), I added a step to the `docker-test` make target to install `test.txt` and `dev.txt`. Thus we presumably get a smaller image (probably not much smaller), reduce the dependency chain or our images, and have less exposure to vulnerabilities while still testing as robustly as before. Incidentally, I removed the `Dockerfile` for our ubuntu image, since it made reference to non-existent make targets, which tells me it's stale and wasn't being used. ### Review: - Reviewer should ensure the dev and test dependencies are not being installed in the docker image. One way to check is to check the logs in CI, and note, e.g. that [this](https://github.com/Unstructured-IO/unstructured/actions/runs/14112971425/job/39536304012#step:3:1700) is the first reference to `pytest` in the docker build and test logs, after the image build is completed. - Reviewer should ensure docker image is still being tested in CI and is passing.
35 lines
1.4 KiB
Docker
35 lines
1.4 KiB
Docker
FROM quay.io/unstructured-io/base-images:wolfi-base-latest AS base
|
|
|
|
ARG PYTHON=python3.11
|
|
ARG PIP="${PYTHON} -m pip"
|
|
|
|
USER root
|
|
|
|
WORKDIR /app
|
|
|
|
COPY ./requirements requirements/
|
|
COPY unstructured unstructured
|
|
COPY test_unstructured test_unstructured
|
|
COPY example-docs example-docs
|
|
|
|
RUN chown -R notebook-user:notebook-user /app && \
|
|
apk add font-ubuntu git && \
|
|
fc-cache -fv && \
|
|
[ -e /usr/bin/python3 ] || ln -s /usr/bin/$PYTHON /usr/bin/python3
|
|
|
|
USER notebook-user
|
|
|
|
# append PATH before pip install to avoid warning logs; it also avoids issues with packages that needs compilation during installation
|
|
ENV PATH="${PATH}:/home/notebook-user/.local/bin"
|
|
ENV TESSDATA_PREFIX=/usr/local/share/tessdata
|
|
ENV NLTK_DATA=/home/notebook-user/nltk_data
|
|
|
|
# Install Python dependencies and download required NLTK packages
|
|
RUN find requirements/ -type f -name "*.txt" ! -name "test.txt" ! -name "dev.txt" ! -name "constraints.txt" -exec $PIP install --no-cache-dir --user -r '{}' ';' && \
|
|
mkdir -p ${NLTK_DATA} && \
|
|
$PYTHON -m nltk.downloader -d ${NLTK_DATA} punkt_tab averaged_perceptron_tagger_eng && \
|
|
$PYTHON -c "from unstructured.partition.model_init import initialize; initialize()" && \
|
|
$PYTHON -c "from unstructured_inference.models.tables import UnstructuredTableTransformerModel; model = UnstructuredTableTransformerModel(); model.initialize('microsoft/table-transformer-structure-recognition')"
|
|
|
|
CMD ["/bin/bash"]
|