10 Commits

Author SHA1 Message Date
Emily Voss
b6ab471f00
Drop Python 3.9 support due to dependency conflicts (#4017) 2025-06-10 23:32:11 -07:00
Matt Robinson
6b400b46fe
feat: add VoyageAI embeddings (#3069) (#3099)
Original PR was #3069. Merged in to a feature branch to fix dependency
and linting issues. Application code changes from the original PR were
already reviewed and approved.

------------
Original PR description:
Adding VoyageAI embeddings 
Voyage AI’s embedding models and rerankers are state-of-the-art in
retrieval accuracy.

---------

Co-authored-by: fzowl <160063452+fzowl@users.noreply.github.com>
Co-authored-by: Liuhong99 <39693953+Liuhong99@users.noreply.github.com>
2024-05-24 21:48:35 +00:00
Roman Isecke
b37b4689bc
drop python3.8 (#2372)
### Description
Remove all uses of python3.8

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: rbiseck3 <rbiseck3@users.noreply.github.com>
2024-01-09 23:37:30 +00:00
cragwolfe
bd8a74d686
chore: shell scripts default indent of 2 instead of 4 (#2287)
Given the tendency for shell scripts to easily enter into a few levels
of indentation and long line lengths, update the default to 2 spaces.
2023-12-19 07:48:21 +00:00
Roman Isecke
76efcf4dd7
chore: add shfmt (#2246)
### Description
Given all the shell files that now exist in the repo, would be nice to
have linting/formatting around them (in addition to the existing
shellcheck which doesn't do anything to format the shell code). This PR
introduces `shfmt` to both check for changes and apply formatting when
the associated make targets are called.
2023-12-12 01:04:15 +00:00
Yao You
69265685ea
build(deps): add makefile to requirements (#1295)
This PR resolves #1294 by adding a Makefile to compile requirements.
This makefile respects the dependencies between file and will compile
them in order. E.g., extra-*.txt will be compiled __after__ base.txt is
updated.

Test locally by simply running `make pip-compile` or `cd requirements &&
make clean && make all`

---------

Co-authored-by: qued <64741807+qued@users.noreply.github.com>
2023-11-02 10:17:35 -05:00
qued
808b4ced7a
build(deps): remove ebooklib (#1878)
* **Removed `ebooklib` as a dependency** `ebooklib` is licensed under
AGPL3, which is incompatible with the Apache 2.0 license. Thus it is
being removed.
2023-10-26 12:22:40 -05:00
Roman Isecke
4802332de0
Roman/optimize ingest ci (#1799)
### Description
Currently the CI caches the CI dependencies but uses the hash of all
files in `requirements/`. This isn't completely accurate since the
ingest dependencies are installed in a later step and don't affect the
cached environment. As part of this PR:
* ingest dependencies were isolated into their own folder in
`requirements/ingest/`
* A new cache setup was introduced in the CI to restore the base cache
-> install ingest dependencies -> cache it with a new id
* new make target created to install all ingest dependencies via `pip
install -r ...`
* updates to Dockerfile to use `find ...` to install all dependencies,
avoiding the need to update this when new deps are added.
* update to pip-compile script to run over all `*.in` files in
`requirements/`
2023-10-24 14:54:00 +00:00
Jack Retterer
b8f24ba67e
Added AWS Bedrock embeddings (#1738)
Summary: Added support for AWS Bedrock embeddings. Leverages
"amazon.titan-tg1-large" for the embedding model.

Test

- find your aws secret access key and key id; make sure the account has
access to bedrock's tian embed model
- follow the instructions in
d5e797cd44/docs/source/bricks/embedding.rst (bedrockembeddingencoder)

---------

Co-authored-by: Ahmet Melek <39141206+ahmetmeleq@users.noreply.github.com>
Co-authored-by: Yao You <yao@unstructured.io>
Co-authored-by: Yao You <theyaoyou@gmail.com>
Co-authored-by: Ahmet Melek <ahmetmeleq@gmail.com>
2023-10-18 19:36:51 -05:00
Roman Isecke
bd49cfbab7
feat: adds Azure Cognitive Search (full text) destination connector (#1459)
### Description
New [Azure Cognitive
Search](https://azure.microsoft.com/en-us/products/ai-services/cognitive-search)
destination connector added. Writes each json element from the created
json files via partition and writes that content to an index.

**Bonus bug fix:** Due to a recent change where the default version of
python used in the repo was bumped to `3.10` from `3.8`, this means
running `pip-compile` now runs it against that version rather than the
lowest we support which is still `3.8`. This breaks the setup for those
lower versions because some of the versions pulled in by `pip-compile`
exist for `3.10` but not `3.8`. `pip-compile` was updates to run as a
script that checks the version of python being used first, which helps
guarantee that all dependencies meet the minimum python version
requirement.

Closes out https://github.com/Unstructured-IO/unstructured/issues/1466
2023-09-25 10:27:42 -04:00