* Use truncate option for cohere request instead of GPT2 tokenizer to truncate texts
* Update max batch size for cohere which is 96
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
* Adding condition to `pinecone` object.
While you can assign any values to `PineconeDocumentStore`'s parameter `pinecone_index`, it must have another condition to prevent that from happening.
* Added test, and changed the code to make sure the pinecone idx variable has correct instance
* fixed black error
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
* add ignore statements to each failing line in haystack/
* simplify workflow
* few typos
* mypy cache directory missing
* mypy cache directory missing
* install types from Haystack only
* install types from rest_api too
* mypy vs literal
* install types at check time
* add mypy cache to python cache
* fix version condition
* fix version condition
* try running mypy only on affected files
* try using explicit hashes
* try another approach
* filter python files
* typo
* quotes
* use action
* feat: add HA support for Weaviate
Adding the `replicationConfig => factor` parameter to the Weaviate class at the time of class creation, allowing the user to have Haystack create a Weaviate "Class" with a replication factor set above 1.
This enables the use of Weaviate in a HA (High Availability) fashion, where the created class is stored on multiple Weaviate nodes increasing Weaviate's throughput and also ensuring high availability.
* Trying out a recommendation from @masci to fix the CI issue
* enable logging-fstring-interpolation
* remove logging-fstring-interpolation from exclusion list
* remove implicit string interpolations added by black
* remove from rest_api too
* fix % sign
* fix crawler and try to run CI
* more compact expression
* try to fix
* improve naming regex
* revert regex
* make test_url compatible wirh Windows
* better conditional expression
* Adding model.eval() calls to prediction functions in table reader
* Add unit test to check if model is set in train mode that inference time prediction still works.
* Add table = table.astype(str) to make sure cells are converted into to strings to be compatible witht the TableReader
* Turn more strings into ints
* Make sure answer text is always a string.
* Started making changes to use native Pytorch AMP
* Updated compute_loss functions to use torch.cuda.amp.autocast
* Updating docstrings
* Add use_amp to trainer_checkpoint
* Removed mentions of apex and started to add the necessary warnings
* Removing unused instances of use_amp variable
* Added fast training test for FARMReader. Needed to add max_query_length as a parameter in FARMReader.__init__ and FARMReader.train
* Make max_query_length optional in FARMReader.train
* Update lg
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: agnieszka-m <amarzec13@gmail.com>
* Adjust max token size for openai ADA-v2 embeddings
* Added requested changes and corrected old seq len
Apparently the limit for the older models is 2046 and not 2048, I included this change directly.
See (https://beta.openai.com/docs/guides/embeddings/what-are-embeddings) to check.
If you set the IMAGE_NAME variable, then the base image will use that name,
but the api image would previously use a hardcoded `deepset/haystack` image name.
* feat: Change `docker-compose.yml` file
* Add `volumes` to read from the local `/pipelines` folder
* Change the `PIPELINE_YAML_PATH` value and refer to the local `pipelines.haystack-pipeline.yml`
* Change the elasticsearch image
* Fix volume
* Update readme to direct users to the new demos repository