unstructured/scripts/docker-build.sh
Matt Robinson db8617872b
build: image and dependency updates; fix tesseract files locations (#3310)
### Summary

Updates to the latest version of the `wolfi-base` image. Changes
include:
- Version bumps to address CVEs
- `libreoffice` is now included in the `arm64`. `.doc` files are now
supported for `arm64`. `.ppt` do not work with the `libreoffice` package
currently available on `wolfi-os`. We have follow on work to look into
that.
- Updates the location of the `tesseract` `tessdata` files on the
`arm64` build. Closes #3290.
- Closes #3319 and addes `psutil` to the base dependencies.

### Testing

- `test_dockerfile` should continue to pass with the updates.
2024-07-01 19:39:32 +00:00

21 lines
647 B
Bash
Executable File

#!/usr/bin/env bash
set -euo pipefail
DOCKER_REPOSITORY="${DOCKER_REPOSITORY:-quay.io/unstructured-io/unstructured}"
PIP_VERSION="${PIP_VERSION:-23.1.2}"
DOCKER_IMAGE="${DOCKER_IMAGE:-unstructured:dev}"
DOCKER_BUILD_CMD=(docker buildx build --load -f Dockerfile
--build-arg PIP_VERSION="$PIP_VERSION"
--build-arg BUILDKIT_INLINE_CACHE=1
--progress plain
--cache-from "$DOCKER_REPOSITORY":latest
-t "$DOCKER_IMAGE" .)
# only build for specific platform if DOCKER_BUILD_PLATFORM is set
if [ -n "${DOCKER_BUILD_PLATFORM:-}" ]; then
DOCKER_BUILD_CMD+=("--platform=$DOCKER_BUILD_PLATFORM")
fi
DOCKER_BUILDKIT=1 "${DOCKER_BUILD_CMD[@]}"