14 Commits

Author SHA1 Message Date
Roman Isecke
b37b4689bc
drop python3.8 (#2372)
### Description
Remove all uses of python3.8

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: rbiseck3 <rbiseck3@users.noreply.github.com>
2024-01-09 23:37:30 +00:00
cragwolfe
bd8a74d686
chore: shell scripts default indent of 2 instead of 4 (#2287)
Given the tendency for shell scripts to easily enter into a few levels
of indentation and long line lengths, update the default to 2 spaces.
2023-12-19 07:48:21 +00:00
Roman Isecke
76efcf4dd7
chore: add shfmt (#2246)
### Description
Given all the shell files that now exist in the repo, would be nice to
have linting/formatting around them (in addition to the existing
shellcheck which doesn't do anything to format the shell code). This PR
introduces `shfmt` to both check for changes and apply formatting when
the associated make targets are called.
2023-12-12 01:04:15 +00:00
ryannikolaidis
e08936b6fb
chore: update all bash scripts to use shebang: /usr/bin/env bash (#779) 2023-06-20 16:00:55 -07:00
cragwolfe
2989f53358
chore: bump to python 3.8.17 (#766)
The images pushed quay.io will now have python 3.8.17 rather than python 3.8.15.
2023-06-16 11:17:03 -07:00
cragwolfe
7b44bcd6e0
build: script to update all ingest fixtures, add azure ingest fixtures (#367)
- Updates CI to install tesseract version 5.3.0 (better than 4.x in various ways incl. perf.).
- Adds azure expected output fixtures for more useful reference points and as a repro for Some PDF's with scanned images return empty elements #346 .
- Adds a script to regenerate ingest test fixtures that is run in an ubuntu docker container (like CI), with the same version of tesseract. See the comments in scripts/ingest-test-fixtures-update.sh for details.
- Updates expected outputs with above script.
- Updates individual test-ingest scripts to update expected .json output if OVERWRITE_FIXTURES=true.
2023-04-11 00:11:50 -07:00
Matt Robinson
e43cb0e6e0
feat: add partition_epub function (#364)
* add pypandoc dependency

* added epub partitioner and file conversion

* test for partition_epub

* tests for file conversion

* add epub to filetype detection

* added epub to auto partition

* update bricks docs

* updated installing docs

* changelot and version

* add pandoc to dependencies

* add pandoc to debian dependencies

* linting, linting, linting

* typo fix

* typo fix

* file conversion type hints

* more type hints

---------

Co-authored-by: qued <64741807+qued@users.noreply.github.com>
2023-03-14 15:52:21 +00:00
Matt Robinson
d17a94f395
chore: add libreoffice to ubuntu install script (#363) 2023-03-13 10:46:23 -04:00
qued
e43e9178ae
feat: amazon linux 2 setup script (#350)
Added Amazon Linux 2 setup script. Also updated Ubuntu setup script to keep the scripts as aligned as possible.

Co-authored-by: cragwolfe <crag@unstructured.io>
2023-03-09 14:52:24 +00:00
qued
ed074b5828
fix: set through env to avoid interpretation as command (#329)
When I took the changes to the Ubuntu setup script and propagated them to other scripts that run in slightly different contexts, the script failed at line 45 as DEBIAN_FRONTEND=noninteractive was interpreted as a command rather than a variable assignment.

Added the env command so there's no misinterpretation. Tested in docker as both root and user.
2023-03-01 12:56:37 -06:00
qued
d566f9b56a
Inject DEBIAN_FRONTEND into sudo env (#290)
Gets rid of the interactive prompt when tzdata gets installed.
2023-02-28 02:27:58 +00:00
qued
30ac3e6daa
Changes so script runs as root in docker (#287) 2023-02-25 13:48:48 -08:00
cragwolfe
0e3440ac08
fix: add libmagic dep to ubuntu script (#281) 2023-02-25 19:53:38 +00:00
qued
a79b365ab4
feat: add ubuntu setup script (#279) 2023-02-24 20:05:26 -06:00