mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-07 17:12:48 +00:00

Add GitLab data connector for ingest. Involves more general Git functionality that is shared between the GitHub and GitLab data connectors. Prevent code duplication for functionality between GitHub and GitLab ingest connectors. Renamed github-access-token, github-branch and github-file-glob to git-access-token, git-branch and git-file-glob, respectively. These work for GitHub and GitLab.
20 lines
588 B
Bash
Executable File
20 lines
588 B
Bash
Executable File
#!/usr/bin/env bash
|
|
|
|
# Processes the Unstructured-IO/unstructured repository
|
|
# through Unstructured's library in 2 processes.
|
|
|
|
# Structured outputs are stored in github-ingest-output/
|
|
|
|
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
|
|
cd "$SCRIPT_DIR"/../../.. || exit 1
|
|
|
|
PYTHONPATH=. ./unstructured/ingest/main.py \
|
|
--github-url Unstructured-IO/unstructured \
|
|
--git-branch main \
|
|
--structured-output-dir github-ingest-output \
|
|
--num-processes 2 \
|
|
--verbose
|
|
|
|
# Alternatively, you can call it using:
|
|
# unstructured-ingest --github-url ...
|