E.g., now can run:
```bash
# extracts base64 encoded image data for `Table` and `Image` elements
$ unstructured-get-json.sh --trace --verbose --images /t/docs/Captur-1317-5_ENG-p5.pdf
# also extracts `Title` elements (see screenshot)
$ IMAGE_BLOCK_TYPES='"title","table","image"' unstructured-get-json.sh --trace --verbose --images /t/docs/Captur-1317-5_ENG-p5.pdf
```
It was discovered during testing that "narrativetext" does not work,
probably due to camel casing of NarrativeText 😬

Adds the bash script `process-pdf-parallel-through-api.sh` that allows
splitting up a PDF into smaller parts (splits) to be processed through
the API concurrently, and is re-entrant. If any of the parts splits fail
to process, one can attempt reprocessing those split(s) by rerunning the
script.
Note: requires the `qpdf` command line utility.
The below command line output shows the scenario where just one split
had to be reprocessed through the API to create the final
`layout-parser-paper_combined.json` output.
```
$ BATCH_SIZE=20 PDF_SPLIT_PAGE_SIZE=6 STRATEGY=hi_res \
./scripts/user/process-pdf-parallel-through-api.sh example-docs/pdf/layout-parser-paper.pdf
> % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
Skipping processing for /Users/cragwolfe/tmp/pdf-splits/layout-parser-paper-output-8a76cb6228e109450992bc097dbd1a51_split-6_strat-hi_res/layout-pars\
er-paper_pages_1_to_6.json as it already exists.
Skipping processing for /Users/cragwolfe/tmp/pdf-splits/layout-parser-paper-output-8a76cb6228e109450992bc097dbd1a51_split-6_strat-hi_res/layout-parser-paper_pages_7_to_12.json as it already exists.
Valid JSON output created: /Users/cragwolfe/tmp/pdf-splits/layout-parser-paper-output-8a76cb6228e109450992bc097dbd1a51_split-6_strat-hi_res/layout-parser-paper_pages_13_to_16.json
Processing complete. Combined JSON saved to /Users/cragwolfe/tmp/pdf-splits/layout-parser-paper-output-8a76cb6228e109450992bc097dbd1a51_split-6_strat-hi_res/layout-parser-paper_combined.json
```
Bonus change to `unstructured-get-json.sh` to point to the standard
hosted Serverless API, but allow using the Free API with --freemium.
### Description
Given all the shell files that now exist in the repo, would be nice to
have linting/formatting around them (in addition to the existing
shellcheck which doesn't do anything to format the shell code). This PR
introduces `shfmt` to both check for changes and apply formatting when
the associated make targets are called.
**Executive Summary**
Eyeballing or saving html in a Table element (in the
`metadata.text_as_html` field) takes some manual effort. This script
provides a quick way to do so given an unstructured .json file that
adheres to the usual schema (i.e., that's returned by the Unstructured
API).
**Testing Instructions**
Get some unstructured output that includes a table. E.g.
[124_PDFsam_Basel III - Finalising post-crisis
reforms.pdf](https://github.com/Unstructured-IO/unstructured/files/13407404/124_PDFsam_Basel.III.-.Finalising.post-crisis.reforms.pdf)
```
./unstructured-get-json.sh --tables --hi-res \
124_PDFsam_Basel\ III\ -\ Finalising\ post-crisis\ reforms.pdf
````
Then use this the following script to view the structure and content of
the tables: (note that output file was copied to the clipboard from
prior command):
```
./u-tables-inspect.sh \
"<snip>/tmp/unst-outputs/124_PDFsam_Basel III - Finalising post-crisis reforms.pdf-hi-res.json"
```
Usage: ./unstructured-get-json.sh [options] <file>"
Options:
--api-key KEY Specify the API key for authentication. Set the env var $UNST_API_KEY to skip providing this option.
--hi-res hi_res strategy: Enable high-resolution processing, with layout segmentation and OCR
--fast fast strategy: No OCR, just extract embedded text
--ocr-only ocr_only strategy: Perform OCR (Optical Character Recognition) only. No layout segmentation.
--tables Enable table extraction: tables are represented as html in metadata
--coordinates Include coordinates in the output
--trace Enable trace logging for debugging, useful to cut and paste the executed curl call
--verbose Enable verbose logging including printing first 8 elements to stdout
--s3 Write the resulting output to s3 (like a pastebin)
--help Display this help and exit.
Arguments:
<file> File to send to the API.
The script requires a <file>, the document to post to the Unstructured API.
The .json result is written to ~/tmp/unst-outputs/ -- this path is echoed and copied to your clipboard.