mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-12 19:45:56 +00:00

### Description Convert s3 cli code to also support writing to s3. Writers are added as optional subcommands to the parent command with their own arguments. Custom `click.Group` introduced to add some custom formatting and text in help messages. To limit the scope of this PR, most existing files were not touched but instead new files were added for the new flow. This allowed _only_ the s3 connector to be updated without breaking any other ones.
21 lines
667 B
Bash
Executable File
21 lines
667 B
Bash
Executable File
#!/usr/bin/env bash
|
|
|
|
# Processes 3 PDF's from s3://utic-dev-tech-fixtures/small-pdf-set/
|
|
# through Unstructured's library in 2 processes.
|
|
|
|
# Structured outputs are stored in s3-small-batch-output/
|
|
|
|
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
|
|
cd "$SCRIPT_DIR"/../../.. || exit 1
|
|
|
|
PYTHONPATH=. ./unstructured/ingest/main.py \
|
|
s3 \
|
|
--remote-url s3://utic-dev-tech-fixtures/small-pdf-set/ \
|
|
--anonymous \
|
|
--output-dir s3-small-batch-output \
|
|
--num-processes 2 \
|
|
--verbose \
|
|
s3 \
|
|
--anonymous \
|
|
--remote-url s3://utic-dev-tech-fixtures/small-pdf-set-output
|