mirror of
				https://github.com/Unstructured-IO/unstructured.git
				synced 2025-11-03 19:43:24 +00:00 
			
		
		
		
	build: switch arm64 image to wolfi-base (#3268)
### Summary Updates the `arm64` build to use the same `Dockerfile` as `amd64`, since there are now upstream base images for `wolfi-base` for both architectures. The legacy `rockylinux-9.4` is now stashed in a subdirectory the `docker` subdirectory and is no longer built in CI, but is available is users would like to build it themselves. Additionally, this PR includes a fix to symlink `python3` to `python3.11`, which had caused a CI failure [here](https://github.com/Unstructured-IO/unstructured/actions/runs/9619486931/job/26535697755). BREAKING CHANGE: the `arm64` image no longer supports `.doc`, `.pptx`, or `.xls` because we do not yet have a `libreoffice` `apk` built for `wolfi-base`. We intend to address that as a follow on. All other filetypes work. ### Testing Successfully docker builds, tests, and smoke tests for [amd64](https://github.com/Unstructured-IO/unstructured/actions/runs/9619458140/job/26535610735?pr=3268) and [arm64](https://github.com/Unstructured-IO/unstructured/actions/runs/9619458140/job/26535610341?pr=3268) on the feature branch (with publish disabled).
This commit is contained in:
		
							parent
							
								
									edddf9f6ee
								
							
						
					
					
						commit
						2d965fd65e
					
				
							
								
								
									
										5
									
								
								.github/workflows/docker-publish.yml
									
									
									
									
										vendored
									
									
								
							
							
						
						
									
										5
									
								
								.github/workflows/docker-publish.yml
									
									
									
									
										vendored
									
									
								
							@ -47,9 +47,8 @@ jobs:
 | 
			
		||||
        password: ${{ secrets.QUAY_IO_ROBOT_TOKEN }}
 | 
			
		||||
    - name: Build images
 | 
			
		||||
      run: |
 | 
			
		||||
        ARCH=$(cut -d "/" -f2 <<< ${{ matrix.docker-platform }})
 | 
			
		||||
        DOCKER_BUILDKIT=1 docker buildx build --platform=$ARCH --load \
 | 
			
		||||
          -f Dockerfile-$ARCH \
 | 
			
		||||
        DOCKER_BUILDKIT=1 docker buildx build --platform=${{ matrix.docker-platform }} --load \
 | 
			
		||||
          -f Dockerfile \
 | 
			
		||||
          --build-arg PIP_VERSION=$PIP_VERSION \
 | 
			
		||||
          --build-arg BUILDKIT_INLINE_CACHE=1 \
 | 
			
		||||
          --progress plain \
 | 
			
		||||
 | 
			
		||||
@ -2,6 +2,8 @@
 | 
			
		||||
 | 
			
		||||
### Enhancements
 | 
			
		||||
 | 
			
		||||
* **Move arm64 image to wolfi-base** The `arm64` image now runs on `wolfi-base`. The `arm64` build for `wolfi-base` does not yet include `libreoffce`, and so `arm64` does not currently support processing `.doc`, `.ppt`, or `.xls` file. If you need to process those files on `arm64`, use the legacy `rockylinux` image.
 | 
			
		||||
 | 
			
		||||
### Features
 | 
			
		||||
 | 
			
		||||
### Fixes
 | 
			
		||||
 | 
			
		||||
@ -9,7 +9,7 @@ COPY unstructured unstructured
 | 
			
		||||
COPY test_unstructured test_unstructured
 | 
			
		||||
COPY example-docs example-docs
 | 
			
		||||
 | 
			
		||||
RUN chown -R notebook-user:notebook-user /app
 | 
			
		||||
RUN chown -R notebook-user:notebook-user /app && ln -s /usr/bin/python3.11 /usr/bin/python3
 | 
			
		||||
 | 
			
		||||
USER notebook-user
 | 
			
		||||
 | 
			
		||||
@ -5,7 +5,7 @@ DOCKER_REPOSITORY="${DOCKER_REPOSITORY:-quay.io/unstructured-io/unstructured}"
 | 
			
		||||
PIP_VERSION="${PIP_VERSION:-23.1.2}"
 | 
			
		||||
DOCKER_IMAGE="${DOCKER_IMAGE:-unstructured:dev}"
 | 
			
		||||
 | 
			
		||||
DOCKER_BUILD_CMD=(docker buildx build --load -f Dockerfile-amd64
 | 
			
		||||
DOCKER_BUILD_CMD=(docker buildx build --load -f Dockerfile
 | 
			
		||||
  --build-arg PIP_VERSION="$PIP_VERSION"
 | 
			
		||||
  --build-arg BUILDKIT_INLINE_CACHE=1
 | 
			
		||||
  --progress plain
 | 
			
		||||
 | 
			
		||||
@ -38,16 +38,10 @@ trap stop_container EXIT
 | 
			
		||||
await_container
 | 
			
		||||
 | 
			
		||||
# Run the tests
 | 
			
		||||
if [[ "$DOCKER_IMAGE" == *"arm64"* ]]; then
 | 
			
		||||
  docker cp test_unstructured_ingest $CONTAINER_NAME:/home/notebook-user
 | 
			
		||||
  docker exec -u root "$CONTAINER_NAME" /bin/bash -c "chown -R 1000:1000 /home/notebook-user/test_unstructured_ingest"
 | 
			
		||||
  docker exec "$CONTAINER_NAME" /bin/bash -c "/home/notebook-user/test_unstructured_ingest/src/wikipedia.sh"
 | 
			
		||||
else
 | 
			
		||||
  docker cp test_unstructured_ingest $CONTAINER_NAME:/app
 | 
			
		||||
  docker cp requirements/ingest $CONTAINER_NAME:/app/requirements/ingest
 | 
			
		||||
  docker exec -u root "$CONTAINER_NAME" /bin/bash -c "chown -R notebook-user:notebook-user /app/test_unstructured_ingest"
 | 
			
		||||
  docker exec "$CONTAINER_NAME" /bin/bash -c "/app/test_unstructured_ingest/src/wikipedia.sh"
 | 
			
		||||
fi
 | 
			
		||||
docker cp test_unstructured_ingest $CONTAINER_NAME:/app
 | 
			
		||||
docker cp requirements/ingest $CONTAINER_NAME:/app/requirements/ingest
 | 
			
		||||
docker exec -u root "$CONTAINER_NAME" /bin/bash -c "chown -R notebook-user:notebook-user /app/test_unstructured_ingest"
 | 
			
		||||
docker exec "$CONTAINER_NAME" /bin/bash -c "/app/test_unstructured_ingest/src/wikipedia.sh"
 | 
			
		||||
 | 
			
		||||
result=$?
 | 
			
		||||
exit $result
 | 
			
		||||
 | 
			
		||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user