mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-24 17:41:15 +00:00

Adds Chroma (also known as ChromaDB) as a vector destination. Currently Chroma is an in-memory single-process oriented library with plans of a hosted and/or more production ready solution -https://docs.trychroma.com/deployment Though they now claim to support multiple Clients hitting the database at once, I found that it was inconsistent. Sometimes multiprocessing worked (maybe 1 out of 3 times) But the other times I would get different errors. So I kept it single process. --------- Co-authored-by: potter-potter <david.potter@gmail.com>
32 lines
1.1 KiB
ReStructuredText
32 lines
1.1 KiB
ReStructuredText
Chroma
|
|
======================
|
|
|
|
Batch process all your records using ``unstructured-ingest`` to store structured outputs locally on your filesystem and upload those to a Chroma database.
|
|
|
|
First you'll need to install the Chroma dependencies as shown here.
|
|
|
|
.. code:: shell
|
|
|
|
pip install "unstructured[chroma]"
|
|
|
|
Run Locally
|
|
-----------
|
|
The upstream connector can be any of the ones supported, but for convenience here, showing a sample command using the
|
|
upstream local connector.
|
|
|
|
.. tabs::
|
|
|
|
.. tab:: Shell
|
|
|
|
.. literalinclude:: ./code/bash/pinecone.sh
|
|
:language: bash
|
|
|
|
.. tab:: Python
|
|
|
|
.. literalinclude:: ./code/python/pinecone.py
|
|
:language: python
|
|
|
|
|
|
For a full list of the options the CLI accepts check ``unstructured-ingest <upstream connector> chroma --help``.
|
|
|
|
NOTE: Keep in mind that you will need to have all the appropriate extras and dependencies for the file types of the documents contained in your data storage platform if you're running this locally. You can find more information about this in the `installation guide <https://unstructured-io.github.io/unstructured/installing.html>`_. |