Benjamin Torres 5052e6cb3b
Added plain-text comparison for tests (#1180)
This PR adds a comparison during ingest test for the content of the
files in plain text (i.e.: without JSON format)
2023-08-29 23:23:14 +00:00

14 lines
414 B
Bash
Executable File

#!/usr/bin/env bash
# Clean the content of json file generated by unstructured library, storing just
# text elements. The resulting file will be stored at the $2 folder with the same
# name as the original file appending .txt as suffix.
# Arguments:
# - $1 path to the file to clean
# - $2 path to folder to store the result
#
BASE=$(basename "$1")
DEST=$2/$BASE.txt
jq '.[].text'<"$1"|fold -w 80 -s > "$DEST"