mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-24 09:26:08 +00:00

Closes https://github.com/Unstructured-IO/unstructured/issues/1414
Closes #2039
This PR:
- Uses Pinecone python cli to implement a destination connector for
Pinecone and provides the ingest readme requirements
[(here)](https://github.com/Unstructured-IO/unstructured/tree/main/unstructured/ingest#the-checklist)
for the connector
- Updates documentation for the s3 destination connector
- Alphabetically sorts setup.py contents
- Updates logs for the chunking node in ingest pipeline
- Adds a baseline session handle implementation for destination
connectors, to be able to parallelize their operations
- For the
[bug](https://github.com/Unstructured-IO/unstructured/issues/1892)
related to persisting element data to ingest embedding nodes; this PR
tests the
[solution](https://github.com/Unstructured-IO/unstructured/pull/1893)
with its ingest test
- Solves a bug on ingest chunking params with [bugfix on chunking params
and implementing related
test](69e1949a6f
)
---------
Co-authored-by: Roman Isecke <136338424+rbiseck3@users.noreply.github.com>
16 lines
1017 B
ReStructuredText
16 lines
1017 B
ReStructuredText
Embedding Configuration
|
|
=========================
|
|
|
|
A common embedding configuration is a critical component that allows for dynamic selection of embedders and
|
|
their associated parameters to create vectors from data. This configuration provides the flexibility to choose
|
|
from various embedding models and fine-tune parameters to optimize the quality and characteristics of the resulting vectors. It
|
|
enables users to tailor the embedding process to the specific needs of their data and downstream applications,
|
|
ensuring that the generated vectors effectively capture semantic relationships and contextual information within
|
|
the dataset.
|
|
|
|
Configs
|
|
---------------------
|
|
* ``embedding_provider``: An unstructured embedding provider to use while doing embedding. A few examples: langchain-openai, langchain-huggingface, langchain-aws-bedrock.
|
|
* ``embedding_api_key``: If an api key is required to generate the embeddings via an api (i.e. OpenAI)
|
|
* ``embedding_model_name``: The model to use for the embedder, if necessary.
|