12 Commits

Author SHA1 Message Date
Matt Robinson
2d965fd65e
build: switch arm64 image to wolfi-base (#3268)
### Summary

Updates the `arm64` build to use the same `Dockerfile` as `amd64`, since
there are now upstream base images for `wolfi-base` for both
architectures. The legacy `rockylinux-9.4` is now stashed in a
subdirectory the `docker` subdirectory and is no longer built in CI, but
is available is users would like to build it themselves.

Additionally, this PR includes a fix to symlink `python3` to
`python3.11`, which had caused a CI failure
[here](https://github.com/Unstructured-IO/unstructured/actions/runs/9619486931/job/26535697755).

BREAKING CHANGE: the `arm64` image no longer supports `.doc`, `.pptx`,
or `.xls` because we do not yet have a `libreoffice` `apk` built for
`wolfi-base`. We intend to address that as a follow on. All other
filetypes work.

### Testing

Successfully docker builds, tests, and smoke tests for
[amd64](https://github.com/Unstructured-IO/unstructured/actions/runs/9619458140/job/26535610735?pr=3268)
and
[arm64](https://github.com/Unstructured-IO/unstructured/actions/runs/9619458140/job/26535610341?pr=3268)
on the feature branch (with publish disabled).
2024-06-22 05:10:29 +00:00
Christine Straub
f23d180d34
fix: docker image publishing error (#3238)
This PR aims to fix a docker image publishing error caused by user
changes when pulling the `amd64` image from the `unstructured`
`wolfi-base` image.
(https://github.com/Unstructured-IO/unstructured/pull/3213).

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: christinestraub <christinestraub@users.noreply.github.com>
2024-06-18 21:01:42 +00:00
Matt Robinson
9cd0e706ab
fix: reenable arm64 builds for docker (#3045)
### Summary

Closes #3034 and reenables ARM64 in the docker build and publish job.
This was taken out in #3039 because we've only build `libreoffice` for
AMD64 and not ARM64. If Chainguard publishes an `apk` for `libreoffice`,
we can support a Chainguard image for both architectures. The smoke test
now differs for both architectures, to reflect differences in the
directory structure.

### Testing

Build and publish ran successfully for ARM64 (job
[here](https://github.com/Unstructured-IO/unstructured/actions/runs/9129712470/job/25104907497))
and AMD64 (job
[here](https://github.com/Unstructured-IO/unstructured/actions/runs/9129712470/job/25104907826)).
2024-05-17 19:27:20 +00:00
Matt Robinson
934f1a464a
fix: disable arm build for chainguard (#3039)
### Summary

Temporarily disables the ARM build due to the error in [this CI
job](https://github.com/Unstructured-IO/unstructured/actions/runs/9114507405/job/25058629166).
Will add back support for ARM using the rockylinux container once we
show this works.
2024-05-17 00:22:10 +00:00
cragwolfe
bd8a74d686
chore: shell scripts default indent of 2 instead of 4 (#2287)
Given the tendency for shell scripts to easily enter into a few levels
of indentation and long line lengths, update the default to 2 spaces.
2023-12-19 07:48:21 +00:00
Roman Isecke
76efcf4dd7
chore: add shfmt (#2246)
### Description
Given all the shell files that now exist in the repo, would be nice to
have linting/formatting around them (in addition to the existing
shellcheck which doesn't do anything to format the shell code). This PR
introduces `shfmt` to both check for changes and apply formatting when
the associated make targets are called.
2023-12-12 01:04:15 +00:00
cragwolfe
69952f66ed
fix(build): update ingest script loc in Dockerfile (#2052)
Fixes docker-smoke-test.sh to reference the new location for the
wikipedia ingest script, which was moved in
https://github.com/Unstructured-IO/unstructured/pull/1951 . This fix
should allow the docker image build to complete on merges to main.

Reference to recent failed job:

https://github.com/Unstructured-IO/unstructured/actions/runs/6819416096/job/18546724401
2023-11-09 21:55:07 -08:00
cragwolfe
6ad497136d
build: docker image fix (#1245)
Moving to a non-root user in the docker image caused a failure in the
publication workflow.

This fix was used to publish the 0.10.9 unstructured image in this
workflow:

https://github.com/Unstructured-IO/unstructured/actions/runs/6020624226/job/16332230987
2023-08-29 23:27:52 -07:00
Trevor Bossert
e4535d29ca
Set user for container to same as api image. (#1239)
This is security best practice, a user can override this with their own
Dockerfile if required.
2023-08-30 01:01:44 +00:00
ryannikolaidis
e08936b6fb
chore: update all bash scripts to use shebang: /usr/bin/env bash (#779) 2023-06-20 16:00:55 -07:00
ryannikolaidis
ee52a749c3
fix: docker smoke test on build (#457) 2023-04-06 10:03:42 -07:00
ryannikolaidis
59785e4332
chore: install all extras in Dockerfile (#419)
* Adds step to install all extras
* Adds smoke test of wikipedia ingest to validate in CI
2023-03-30 13:23:30 -07:00