haystack/docker/Dockerfile.base
Massimiliano Pippi 64b0c43885
refactoring: reimplement Docker strategy (#3162)
* setup base images

* add cpu flavor

* use the same Dockerfile for cpu and gpu

* better naming, add docs

* add docker workflow

* add missing image input

* change cwd for bake

* also push api images

* try conditional tagging for releases

* revert testing code

* update docker readme

* document variable override

* use Python 3.10

* allow empty HAYSTACK_EXTRAS

* Apply suggestions from code review

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* remove repo description step, can't make it work so far

* add docs to the last step as it's tricky

* manage tags for the newest images

* tests are passing, checking in the last bit

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
2022-09-12 16:33:56 +02:00

39 lines
1.2 KiB
Docker

ARG build_image
ARG base_immage
FROM $build_image AS build-image
ARG haystack_version
ARG haystack_extras
ARG torch_scatter
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential gcc git curl \
tesseract-ocr libtesseract-dev poppler-utils
# Install PDF converter
RUN curl -O https://dl.xpdfreader.com/xpdf-tools-linux-4.04.tar.gz && \
tar -xvf xpdf-tools-linux-4.04.tar.gz && \
cp xpdf-tools-linux-4.04/bin64/pdftotext /opt && \
rm -rf xpdf-tools-linux-4.04
# Shallow clone Haystack repo, we'll install from the local sources
RUN git clone --depth=1 --branch=${haystack_version} https://github.com/deepset-ai/haystack.git /opt/haystack
WORKDIR /opt/haystack
# Use a virtualenv we can copy over the next build stage
RUN python -m venv --system-site-packages /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
RUN pip install --upgrade pip && \
pip install --no-cache-dir .${haystack_extras} && \
pip install --no-cache-dir ./rest_api && \
pip install --no-cache-dir torch-scatter -f $torch_scatter
FROM $base_immage AS final
COPY --from=build-image /opt/venv /opt/venv
COPY --from=build-image /opt/pdftotext /usr/local/bin
ENV PATH="/opt/venv/bin:$PATH"