3 Commits

Author SHA1 Message Date
cragwolfe
bd8a74d686
chore: shell scripts default indent of 2 instead of 4 (#2287)
Given the tendency for shell scripts to easily enter into a few levels
of indentation and long line lengths, update the default to 2 spaces.
2023-12-19 07:48:21 +00:00
Roman Isecke
76efcf4dd7
chore: add shfmt (#2246)
### Description
Given all the shell files that now exist in the repo, would be nice to
have linting/formatting around them (in addition to the existing
shellcheck which doesn't do anything to format the shell code). This PR
introduces `shfmt` to both check for changes and apply formatting when
the associated make targets are called.
2023-12-12 01:04:15 +00:00
cragwolfe
d7456ab6d2
feat: convenience script to view tables (#2124)
**Executive Summary**

Eyeballing or saving html in a Table element (in the
`metadata.text_as_html` field) takes some manual effort. This script
provides a quick way to do so given an unstructured .json file that
adheres to the usual schema (i.e., that's returned by the Unstructured
API).

**Testing Instructions**

Get some unstructured output that includes a table. E.g. 

[124_PDFsam_Basel III - Finalising post-crisis
reforms.pdf](https://github.com/Unstructured-IO/unstructured/files/13407404/124_PDFsam_Basel.III.-.Finalising.post-crisis.reforms.pdf)

```
./unstructured-get-json.sh --tables --hi-res \
  124_PDFsam_Basel\ III\ -\ Finalising\ post-crisis\ reforms.pdf
````

Then use this the following script to view the structure and content of
the tables: (note that output file was copied to the clipboard from
prior command):

```
./u-tables-inspect.sh \
"<snip>/tmp/unst-outputs/124_PDFsam_Basel III - Finalising post-crisis reforms.pdf-hi-res.json"
```
2023-11-21 22:18:39 -08:00