unstructured/docs/source/ingest/configs/embedding_config.rst
Ahmet Melek ed08773de7
feat: add pinecone destination connector (#1774)
Closes https://github.com/Unstructured-IO/unstructured/issues/1414
Closes #2039 

This PR:
- Uses Pinecone python cli to implement a destination connector for
Pinecone and provides the ingest readme requirements
[(here)](https://github.com/Unstructured-IO/unstructured/tree/main/unstructured/ingest#the-checklist)
for the connector
- Updates documentation for the s3 destination connector
- Alphabetically sorts setup.py contents
- Updates logs for the chunking node  in ingest pipeline
- Adds a baseline session handle implementation for destination
connectors, to be able to parallelize their operations
- For the
[bug](https://github.com/Unstructured-IO/unstructured/issues/1892)
related to persisting element data to ingest embedding nodes; this PR
tests the
[solution](https://github.com/Unstructured-IO/unstructured/pull/1893)
with its ingest test
- Solves a bug on ingest chunking params with [bugfix on chunking params
and implementing related
test](69e1949a6f)

---------

Co-authored-by: Roman Isecke <136338424+rbiseck3@users.noreply.github.com>
2023-11-29 22:37:32 +00:00

16 lines
1017 B
ReStructuredText

Embedding Configuration
=========================
A common embedding configuration is a critical component that allows for dynamic selection of embedders and
their associated parameters to create vectors from data. This configuration provides the flexibility to choose
from various embedding models and fine-tune parameters to optimize the quality and characteristics of the resulting vectors. It
enables users to tailor the embedding process to the specific needs of their data and downstream applications,
ensuring that the generated vectors effectively capture semantic relationships and contextual information within
the dataset.
Configs
---------------------
* ``embedding_provider``: An unstructured embedding provider to use while doing embedding. A few examples: langchain-openai, langchain-huggingface, langchain-aws-bedrock.
* ``embedding_api_key``: If an api key is required to generate the embeddings via an api (i.e. OpenAI)
* ``embedding_model_name``: The model to use for the embedder, if necessary.