5 Commits

Author SHA1 Message Date
Matt Robinson
9acf26ec2e
docs: explicitly replace all old pages with link to new docs (#3118)
### Summary

Explicitly replaces all old docs pages with a link to the new docs. This
was required because 404 redirects didn't work for pages that previously
existed, though they worked non-existing paths that never existed.
2024-05-30 13:01:33 +00:00
Matt Robinson
73739b38cc
docs: redirect to docs.unstructured.io on github pages (#3054)
### Summary

Updates GitHub pages to redirect to the new https://docs.unstructured.io
page. This will appear on GitHub pages after the next tag.

### Testing

1. From the docs direction, run `make html`. You should not see any
errors or warnings
2. Open `unstructured/docs/build/html/index.html`. It should look like
the following:
<img width="1512" alt="image"
src="https://github.com/Unstructured-IO/unstructured/assets/1635179/077626a5-d88a-467e-9e37-273a92e75d30">
3. Open `unstructured/docs/build/html/404.html`. It should redirect back
to `index.html`. Per the [GitHub pages
docs](https://docs.github.com/en/pages/getting-started-with-github-pages/creating-a-custom-404-page-for-your-github-pages-site),
that page will get served for 404 errors, meaning any links to old docs
pages will redirect to `index.html`, which points users to the new docs
page.
2024-05-21 09:38:32 -04:00
Roman Isecke
a8de52e94f
feat: databricks volumes dest added (#2391)
### Description
This adds in a destination connector to write content to the Databricks
Unity Catalog Volumes service. Currently there is an internal account
that can be used for testing manually but there is not dedicated account
to use for testing so this is not being added to the automated ingest
tests that get run in the CI.

To test locally: 
```shell
#!/usr/bin/env bash

path="testpath/$(uuidgen)"

PYTHONPATH=. python ./unstructured/ingest/main.py local \
    --num-processes 4 \
    --output-dir azure-test \
    --strategy fast \
    --verbose \
    --input-path example-docs/fake-memo.pdf \
    --recursive \
    databricks-volumes \
    --catalog "utic-dev-tech-fixtures" \
    --volume "small-pdf-set" \
    --volume-path "$path" \
    --username "$DATABRICKS_USERNAME" \
    --password "$DATABRICKS_PASSWORD" \
    --host "$DATABRICKS_HOST"
```
2024-01-23 01:25:51 +00:00
David Potter
d7f4c24e21
fix documentation for chroma (#2403)
To test:

cd docs && make HTML

changelogs:

point main readme to the correct connector html page
point chroma docs to correct sample code

---------

Co-authored-by: potter-potter <david.potter@gmail.com>
2024-01-17 01:53:52 +00:00
David Potter
4b8352e0f5
feat: add chroma destination connector (#2240)
Adds Chroma (also known as ChromaDB) as a vector destination.

Currently Chroma is an in-memory single-process oriented library with
plans of a hosted and/or more production ready solution
-https://docs.trychroma.com/deployment

Though they now claim to support multiple Clients hitting the database
at once, I found that it was inconsistent. Sometimes multiprocessing
worked (maybe 1 out of 3 times) But the other times I would get
different errors. So I kept it single process.

---------

Co-authored-by: potter-potter <david.potter@gmail.com>
2023-12-19 16:58:23 +00:00