unstructured

mirror of https://github.com/Unstructured-IO/unstructured.git synced 2025-07-24 17:41:15 +00:00

Author	SHA1	Message	Date
Matt Robinson	9acf26ec2e	docs: explicitly replace all old pages with link to new docs (#3118 ) ### Summary Explicitly replaces all old docs pages with a link to the new docs. This was required because 404 redirects didn't work for pages that previously existed, though they worked non-existing paths that never existed.	2024-05-30 13:01:33 +00:00
Matt Robinson	73739b38cc	docs: redirect to docs.unstructured.io on github pages (#3054 ) ### Summary Updates GitHub pages to redirect to the new https://docs.unstructured.io page. This will appear on GitHub pages after the next tag. ### Testing 1. From the docs direction, run `make html`. You should not see any errors or warnings 2. Open `unstructured/docs/build/html/index.html`. It should look like the following: <img width="1512" alt="image" src="https://github.com/Unstructured-IO/unstructured/assets/1635179/077626a5-d88a-467e-9e37-273a92e75d30"> 3. Open `unstructured/docs/build/html/404.html`. It should redirect back to `index.html`. Per the [GitHub pages docs](https://docs.github.com/en/pages/getting-started-with-github-pages/creating-a-custom-404-page-for-your-github-pages-site), that page will get served for 404 errors, meaning any links to old docs pages will redirect to `index.html`, which points users to the new docs page.	2024-05-21 09:38:32 -04:00
Roman Isecke	a8de52e94f	feat: databricks volumes dest added (#2391 ) ### Description This adds in a destination connector to write content to the Databricks Unity Catalog Volumes service. Currently there is an internal account that can be used for testing manually but there is not dedicated account to use for testing so this is not being added to the automated ingest tests that get run in the CI. To test locally: ```shell #!/usr/bin/env bash path="testpath/$(uuidgen)" PYTHONPATH=. python ./unstructured/ingest/main.py local \ --num-processes 4 \ --output-dir azure-test \ --strategy fast \ --verbose \ --input-path example-docs/fake-memo.pdf \ --recursive \ databricks-volumes \ --catalog "utic-dev-tech-fixtures" \ --volume "small-pdf-set" \ --volume-path "$path" \ --username "$DATABRICKS_USERNAME" \ --password "$DATABRICKS_PASSWORD" \ --host "$DATABRICKS_HOST" ```	2024-01-23 01:25:51 +00:00
David Potter	d7f4c24e21	fix documentation for chroma (#2403 ) To test: cd docs && make HTML changelogs: point main readme to the correct connector html page point chroma docs to correct sample code --------- Co-authored-by: potter-potter <david.potter@gmail.com>	2024-01-17 01:53:52 +00:00
David Potter	4b8352e0f5	feat: add chroma destination connector (#2240 ) Adds Chroma (also known as ChromaDB) as a vector destination. Currently Chroma is an in-memory single-process oriented library with plans of a hosted and/or more production ready solution -https://docs.trychroma.com/deployment Though they now claim to support multiple Clients hitting the database at once, I found that it was inconsistent. Sometimes multiprocessing worked (maybe 1 out of 3 times) But the other times I would get different errors. So I kept it single process. --------- Co-authored-by: potter-potter <david.potter@gmail.com>	2023-12-19 16:58:23 +00:00

5 Commits