mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-09 18:15:55 +00:00

Adds OpenSearch as a source and destination. Since OpenSearch is a fork of Elasticsearch, these connectors rely heavily on inheriting the Elasticsearch connectors whenever possible. - Adds OpenSearch source connector to be able to ingest documents from OpenSearch. - Adds OpenSearch destination connector to be able to ingest documents from any supported source, embed them and write the embeddings / documents into OpenSearch. - Defines an example unstructured elements schema for users to be able to setup their unstructured OpenSearch indexes easily. --------- Co-authored-by: potter-potter <david.potter@gmail.com>
20 lines
532 B
Bash
20 lines
532 B
Bash
#!/usr/bin/env bash
|
|
|
|
EMBEDDING_PROVIDER=${EMBEDDING_PROVIDER:-"langchain-huggingface"}
|
|
|
|
unstructured-ingest \
|
|
local \
|
|
--input-path example-docs/book-war-and-peace-1225p.txt \
|
|
--output-dir local-output-to-opensearch \
|
|
--strategy fast \
|
|
--chunk-elements \
|
|
--embedding-provider "$EMBEDDING_PROVIDER" \
|
|
--num-processes 4 \
|
|
--verbose \
|
|
opensearch \
|
|
--hosts "$OPENSEARCH_HOSTS" \
|
|
--username "$OPENSEARCH_USERNAME" \
|
|
--password "$OPENSEARCH_PASSWORD" \
|
|
--index-name "$OPENSEARCH_INDEX_NAME" \
|
|
--num-processes 2
|