unstructured/test_unstructured_ingest/test-ingest-against-api.sh
Matt Robinson 8683e2695c
fix: enable partition_pdf to recursively grab text with fast strategy (#796)
* initial pass on text in figures

* refactor text extraction

* update tests

* fix title test

* add test for docs that require recursive text grab

* version and changelog

* ingest-test-fixtures-update

* there are 8 pdf files now
2023-06-22 11:19:54 -04:00

24 lines
539 B
Bash
Executable File

#!/usr/bin/env bash
set -e
SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)
cd "$SCRIPT_DIR"/.. || exit 1
PYTHONPATH=. ./unstructured/ingest/main.py \
--local-input-path example-docs \
--local-file-glob "*.pdf" \
--structured-output-dir api-ingest-output \
--partition-by-api \
--partition-strategy hi_res \
--verbose \
--reprocess
set +e
if [ "$(find 'api-ingest-output' -type f -printf '.' | wc -c)" != 8 ]; then
echo
echo "8 files should have been created."
exit 1
fi