unstructured/scripts/docker-build-ubuntu.sh
cragwolfe 7b44bcd6e0
build: script to update all ingest fixtures, add azure ingest fixtures (#367)
- Updates CI to install tesseract version 5.3.0 (better than 4.x in various ways incl. perf.).
- Adds azure expected output fixtures for more useful reference points and as a repro for Some PDF's with scanned images return empty elements #346 .
- Adds a script to regenerate ingest test fixtures that is run in an ubuntu docker container (like CI), with the same version of tesseract. See the comments in scripts/ingest-test-fixtures-update.sh for details.
- Updates expected outputs with above script.
- Updates individual test-ingest scripts to update expected .json output if OVERWRITE_FIXTURES=true.
2023-04-11 00:11:50 -07:00

12 lines
367 B
Bash
Executable File

#!/usr/bin/env bash
# Mainly useful for building an image from which to update test-ingest fixtures
set -eu -o pipefail
# Change to the root of the repository
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
cd "$SCRIPT_DIR"/.. || exit 1
docker build -t unstructured-ubuntu:latest --progress plain -f docker/ubuntu-22/Dockerfile .