build: switch arm64 image to wolfi-base (#3268)

### Summary

Updates the `arm64` build to use the same `Dockerfile` as `amd64`, since
there are now upstream base images for `wolfi-base` for both
architectures. The legacy `rockylinux-9.4` is now stashed in a
subdirectory the `docker` subdirectory and is no longer built in CI, but
is available is users would like to build it themselves.

Additionally, this PR includes a fix to symlink `python3` to
`python3.11`, which had caused a CI failure
[here](https://github.com/Unstructured-IO/unstructured/actions/runs/9619486931/job/26535697755).

BREAKING CHANGE: the `arm64` image no longer supports `.doc`, `.pptx`,
or `.xls` because we do not yet have a `libreoffice` `apk` built for
`wolfi-base`. We intend to address that as a follow on. All other
filetypes work.

### Testing

Successfully docker builds, tests, and smoke tests for
[amd64](https://github.com/Unstructured-IO/unstructured/actions/runs/9619458140/job/26535610735?pr=3268)
and
[arm64](https://github.com/Unstructured-IO/unstructured/actions/runs/9619458140/job/26535610341?pr=3268)
on the feature branch (with publish disabled).
This commit is contained in:
Matt Robinson 2024-06-22 01:10:29 -04:00 committed by GitHub
parent edddf9f6ee
commit 2d965fd65e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 10 additions and 15 deletions

View File

@ -47,9 +47,8 @@ jobs:
password: ${{ secrets.QUAY_IO_ROBOT_TOKEN }} password: ${{ secrets.QUAY_IO_ROBOT_TOKEN }}
- name: Build images - name: Build images
run: | run: |
ARCH=$(cut -d "/" -f2 <<< ${{ matrix.docker-platform }}) DOCKER_BUILDKIT=1 docker buildx build --platform=${{ matrix.docker-platform }} --load \
DOCKER_BUILDKIT=1 docker buildx build --platform=$ARCH --load \ -f Dockerfile \
-f Dockerfile-$ARCH \
--build-arg PIP_VERSION=$PIP_VERSION \ --build-arg PIP_VERSION=$PIP_VERSION \
--build-arg BUILDKIT_INLINE_CACHE=1 \ --build-arg BUILDKIT_INLINE_CACHE=1 \
--progress plain \ --progress plain \

View File

@ -2,6 +2,8 @@
### Enhancements ### Enhancements
* **Move arm64 image to wolfi-base** The `arm64` image now runs on `wolfi-base`. The `arm64` build for `wolfi-base` does not yet include `libreoffce`, and so `arm64` does not currently support processing `.doc`, `.ppt`, or `.xls` file. If you need to process those files on `arm64`, use the legacy `rockylinux` image.
### Features ### Features
### Fixes ### Fixes

View File

@ -9,7 +9,7 @@ COPY unstructured unstructured
COPY test_unstructured test_unstructured COPY test_unstructured test_unstructured
COPY example-docs example-docs COPY example-docs example-docs
RUN chown -R notebook-user:notebook-user /app RUN chown -R notebook-user:notebook-user /app && ln -s /usr/bin/python3.11 /usr/bin/python3
USER notebook-user USER notebook-user

View File

@ -5,7 +5,7 @@ DOCKER_REPOSITORY="${DOCKER_REPOSITORY:-quay.io/unstructured-io/unstructured}"
PIP_VERSION="${PIP_VERSION:-23.1.2}" PIP_VERSION="${PIP_VERSION:-23.1.2}"
DOCKER_IMAGE="${DOCKER_IMAGE:-unstructured:dev}" DOCKER_IMAGE="${DOCKER_IMAGE:-unstructured:dev}"
DOCKER_BUILD_CMD=(docker buildx build --load -f Dockerfile-amd64 DOCKER_BUILD_CMD=(docker buildx build --load -f Dockerfile
--build-arg PIP_VERSION="$PIP_VERSION" --build-arg PIP_VERSION="$PIP_VERSION"
--build-arg BUILDKIT_INLINE_CACHE=1 --build-arg BUILDKIT_INLINE_CACHE=1
--progress plain --progress plain

View File

@ -38,16 +38,10 @@ trap stop_container EXIT
await_container await_container
# Run the tests # Run the tests
if [[ "$DOCKER_IMAGE" == *"arm64"* ]]; then docker cp test_unstructured_ingest $CONTAINER_NAME:/app
docker cp test_unstructured_ingest $CONTAINER_NAME:/home/notebook-user docker cp requirements/ingest $CONTAINER_NAME:/app/requirements/ingest
docker exec -u root "$CONTAINER_NAME" /bin/bash -c "chown -R 1000:1000 /home/notebook-user/test_unstructured_ingest" docker exec -u root "$CONTAINER_NAME" /bin/bash -c "chown -R notebook-user:notebook-user /app/test_unstructured_ingest"
docker exec "$CONTAINER_NAME" /bin/bash -c "/home/notebook-user/test_unstructured_ingest/src/wikipedia.sh" docker exec "$CONTAINER_NAME" /bin/bash -c "/app/test_unstructured_ingest/src/wikipedia.sh"
else
docker cp test_unstructured_ingest $CONTAINER_NAME:/app
docker cp requirements/ingest $CONTAINER_NAME:/app/requirements/ingest
docker exec -u root "$CONTAINER_NAME" /bin/bash -c "chown -R notebook-user:notebook-user /app/test_unstructured_ingest"
docker exec "$CONTAINER_NAME" /bin/bash -c "/app/test_unstructured_ingest/src/wikipedia.sh"
fi
result=$? result=$?
exit $result exit $result