haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-29 11:50:34 +00:00

Author	SHA1	Message	Date
Silvano Cerza	181e5474e8	ci: Automate OpenAPI specs upload to Readme.io (#4228 ) * Remove OpenAPI specs file * OpenAPI specs are now automatically uploaded when necessary * Rename openapi workflow	2023-02-22 18:01:18 +01:00
github-actions[bot]	aaa1522c45	Update unstable version and openapi schema (#4205 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2023-02-20 14:57:45 +01:00
Agnieszka Marzec	e16f1c8935	Docs: Add filter to hide entity post processor (#4160 ) * Add filter to hide entity post processor * Add missing space	2023-02-16 16:40:42 +01:00
bogdankostic	27aaa92800	docs: Remove some classes regarding PromptNode from API reference docs (#4132 )	2023-02-10 15:56:38 +01:00
Agnieszka Marzec	8135e75139	Add shaper to api docs (#4083 )	2023-02-08 12:15:08 +01:00
tstadel	92c58cfda1	feat: Support multiple document_ids in Answer object (for generative QA) (#4062 ) * initial version without shapers * set document_ids for BaseGenerator * introduce question-answering-with-references template * better prompt * make PromptTemplate control output_variable * update schema * fix add_doc_meta_data_to_answer * Revert "fix add_doc_meta_data_to_answer" This reverts commit b994db423ad8272c140ce2b785cf359d55383ff9. * fix add_doc_meta_data_to_answer * fix eval * fix pylint * fix pinecone * fix other tests * fix test * fix flaky test * Revert "fix flaky test" This reverts commit 7ab04275ffaaaca96b4477325ba05d5f34d38775. * adjust docstrings * make Label loading backward-compatible * fix Label backward compatibility for pinecone * fix Label backward compatibility for search engines * fix Label backward compatibility for deepset Cloud * fix tests * fix None issue * fix test_write_feedback * add tests for legacy label support * add document_id test for pinecone * reduce unnecessary contents * add comment to pinecone test	2023-02-08 08:37:22 +01:00
Massimiliano Pippi	8824f3a10a	re-organize pydoc config files (#4042 )	2023-02-03 12:51:10 +01:00
Massimiliano Pippi	76bb105388	chore: remove unneeded files (#4036 ) * remove unneeded files * readme file should stay	2023-02-02 15:38:56 +01:00
tstadel	8002cf92d6	fix: extend schema for prompt node results (#3891 ) * extend schema for prompt node results * extend schema * update openapi * fix mypy for test module * added 1.14 specs * reverted schema for 1.13 --------- Co-authored-by: bogdankostic <bogdankostic@web.de> Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com> Co-authored-by: Sebastian <sjrl@users.noreply.github.com> Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>	2023-01-31 16:31:33 +01:00
Agnieszka Marzec	f6a99b6ebc	Fix: Fix quotation marks (#3973 ) * Fix quotation marks * Fix the order	2023-01-27 13:32:52 +01:00
Agnieszka Marzec	7937ef8995	Add csvconverter to API docs (#3968 )	2023-01-27 11:42:22 +01:00
Agnieszka Marzec	88650c9b0a	Add imgtotext api doc (#3966 )	2023-01-27 09:07:53 +01:00
Massimiliano Pippi	7f6ed941d4	chore: bump pydoc-markdown version used in the CI (#3955 ) * use latest pydoc-markdown * make the workflow manually actionable * Apply suggestions from code review Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com> Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>	2023-01-26 16:58:43 +01:00
github-actions[bot]	d962bc0bc9	Update unstable version and openapi schema (#3924 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>	2023-01-26 01:02:49 +05:30
ZanSara	94f660c56f	feat: store `id_hash_keys` in `Document` objects to make documents clonable (#3697 ) * store id_hash_keys in Document objects * fix id_hash_keys calls throughout codebase * generate schema * fix es * fix weaviate * backward compatible * openapi schema * remove unused deprecation warning * remove unused imports * openapi * unused var * Apply suggestions from code review Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/schema.py * Apply suggestions from code review Co-authored-by: bogdankostic <bogdankostic@web.de> * Update haystack/schema.py * review feedback * trailing spaces * pylint * add deprecation test Co-authored-by: bogdankostic <bogdankostic@web.de>	2023-01-23 15:00:52 +01:00
ZanSara	3ffdb0a9a3	chore: fix all EOF (#3852 ) * fix all eof * fix test * fix test * fix test * typo * fix sample * fix sample * add logs * fix page_dynamic_result.txt	2023-01-16 12:34:50 +01:00
Sebastian	e84fae2894	Migrating to use native Pytorch AMP (#2827 ) * Started making changes to use native Pytorch AMP * Updated compute_loss functions to use torch.cuda.amp.autocast * Updating docstrings * Add use_amp to trainer_checkpoint * Removed mentions of apex and started to add the necessary warnings * Removing unused instances of use_amp variable * Added fast training test for FARMReader. Needed to add max_query_length as a parameter in FARMReader.__init__ and FARMReader.train * Make max_query_length optional in FARMReader.train * Update lg Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: agnieszka-m <amarzec13@gmail.com>	2023-01-05 09:14:28 +01:00
Bilge Yücel	ddba75021a	fix: add additional settings to OpenAPI schema (#3788 ) * "proxy-enabled": disable CORS proxy * "samples-languages": display two languages initially	2022-12-30 16:10:37 +03:00
bogdankostic	36cfd41713	Add newline when generating OpenAPI specs (#3782 )	2022-12-29 17:55:43 +01:00
Agnieszka Marzec	b8fff837b4	docs: Add info where the feedback is stored (#3772 ) * Add info where the feedback is stored * Fix misplaced line breaks * Generate OpenAPI Specs * Generate OpenAPI Specs * Apply black * Generate OpenAPI specs * Add missing whitespace Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-12-28 14:46:26 +01:00
Bilge Yücel	86ade4817e	bug: fix the docs rest api reference url (#3775 ) * bug: fix the docs rest api reference url * revert openapi json changes * remove last line on json files * Add explanation about `servers` and remove `servers` parameter from FastAPI * generate openapi schema without empty end line	2022-12-28 12:30:58 +03:00
Agnieszka Marzec	367c63ef1d	Update readme (#3744 )	2022-12-22 15:53:48 +01:00
Tuana Celik	fe5e0164e8	chore: adding template for prompt node (#3738 )	2022-12-21 20:13:57 +01:00
Stefano Fiorucci	e1401f79b6	refactor: improve Multilabel design (#3658 ) * first try and new test * fix test * fix unused import * remove comments * no more dataclass * add __eq__ and extend test * better design from review * Update schema.py * fix black * fix openapi * fix openapi 2 * new try to fix openapi * remove newline from openapi json	2022-12-13 10:45:56 +01:00
github-actions[bot]	5405d9d7f8	Update unstable version and openapi schema (#3700 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2022-12-13 09:59:52 +01:00
Sara Zan	eba518a589	add trailing newlines to make `end-of-file-fixer` happy (#3699 )	2022-12-12 14:42:25 +01:00
github-actions[bot]	af78f8b431	Update unstable version and openapi schema (#3584 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2022-11-16 10:09:40 +01:00
Massimiliano Pippi	0c1de3745d	fix milvus imports (#3576 )	2022-11-15 10:58:51 +01:00
Massimiliano Pippi	da6b0dc66f	feat: introduce proposal design process (#3333 ) * add RFC process * migrate old ADR to the new process * typo * review comments * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * [skip ci] review feedback * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * [skip ci] leftover * rename to proposals * Adjust naming * Update 2170-pydantic-dataclasses.md Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2022-11-11 12:49:23 +01:00
Stefano Fiorucci	1a60e21137	refactor: simplify Summarizer, add Document Merger (#3452 ) * remove generate_single_summary * update schemas * remove unused import * fix mypy * fix mypy * test: summarizer doesnt change content * other test correction * move test_summarizer_translation to test_extractor_translation * fix test * first try for doc merger * reintroduce and deprecate generate_single_summary * progress in document merger * document merger! * mypy, pylint fixes * use generator * added test that will fail in 1.12 * adapt to review * extended deprecation docstring * Update test/nodes/test_extractor_translation.py * Update test/nodes/test_summarizer.py * Update test/nodes/test_summarizer.py * black * documents fixture Co-authored-by: Sara Zan <sarazanzo94@gmail.com>	2022-11-03 16:04:53 +01:00
Sara Zan	8ddeda811a	generate docs for search.engine.py (#3507 )	2022-10-31 16:57:39 +01:00
bogdankostic	4fbe80c098	feat: Extraction of headlines in markdown files (#3445 ) * Extract headings from markdown files + adapt PreProcessor * Add tests * Fix mypy * Generate JSON schema * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/nodes/file_converter/markdown.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply black * Add PR feedback Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>	2022-10-26 11:57:55 +02:00
Branden Chan	7b15799853	Change slug and title (#3474 )	2022-10-25 16:41:27 +01:00
Stefano Fiorucci	54ec13eaf7	refactor: Change `no_answer` attribute (#3411 ) * always run validation * update schemas * no_answer as a property. break things! * forgotten schema * fix * update openapi * removed my unnecessary test * fix sql document store Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>	2022-10-25 13:07:00 +02:00
Branden Chan	03ba07dcb5	docs: Extend utils API docs coverage (#3402 ) * Add more utils modules * Format docstrings * Incorporate reviewer feedback	2022-10-21 12:51:11 +01:00
Branden Chan	3f956c75f4	Add multimodal retrieval to API docs (#3430 )	2022-10-20 15:07:48 +02:00
Massimiliano Pippi	5335e9e4d9	Add new schema for latest unstable (#3415 ) * add new schema for latest unstable * openapi	2022-10-19 13:21:05 +02:00
Sebastian	15a59fd040	feat: Updated EntityExtractor to handle long texts and added better postprocessing (#3154 ) * Remove dependence on HuggingFace TokenClassificationPipeline and group all postprocessing functions under one class * Added copyright notice for HF and deepset to entity file to acknowledge that a lot of the postprocessing parts came from the transformers library. * Fixed text squishing problem. Added additional unit test for it. Co-authored-by: ju-gu <julian.gutsch@deepset.ai>	2022-10-17 21:26:44 +02:00
Sara Zan	101d2bc86c	feat: `MultiModalRetriever` (#2891 ) * Adding Data2VecVision and Data2VecText to the supported models and adapt Tokenizers accordingly * content_types * Splitting classes into respective folders * small changes * Fix EOF * eof * black * API * EOF * whitespace * api * improve multimodal similarity processor * tokenizer -> feature extractor * Making feature vectors come out of the feature extractor in the similarity head * embed_queries is now self-sufficient * couple trivial errors * Implemented separate language model classes for multimodal inference * Document embedding seems to work * removing batch_encode_plus, is deprecated anyway * Realized the base Data2Vec models are not trained on retrieval tasks * Issue with the generated embeddings * Add batching * Try to fit CLIP in * Stub of CLIP integration * Retrieval goes through but returns noise only * Still working on the scores * Introduce temporary adapter for CLIP models * Image retrieval now works with sentence-transformers * Tidying up the code * Refactoring is now functional * Add MPNet to the supported sentence transformers models * Remove unused classes * pylint * docs * docs * Remove the method renaming * mpyp first pass * docs * tutorial * schema * mypy * Move devices setup into get_model * more mypy * mypy * pylint * Move a few params in HaystackModel's init * make feature extractor work with squadprocessor * fix feature_extractor_kwargs forwarding * Forgotten part of the fix * Revert unrelated ES change * Revert unrelated memdocstore changes * comment * Small corrections * mypy and pylint * mypy * typo * mypy * Refactor the call * mypy * Do not make FARMReader use the new FeatureExtractor * mypy * Detach DPR tests from FeatureExtractor too * Detach processor tests too * Add end2end marker * extract end2end feature extractor tests * temporary disable feature extraction tests * Introduce end2end tests for tokenizer tests * pylint * Fix model loading from folder in FeatureExtractor * working o n end2end * end2end keeps failing * Restructuring retriever tests * Restructuring retriever tests * remove covert_dataset_to_dataloader * remove comment * Better check sentence-transformers models * Use embed_meta_fields properly * rename passage into document * Embedding dims can't be found * Add check for models that support it * pylint * Split all retriever tests into suites, running mostly on InMemory only * fix mypy * fix tfidf test * fix weaviate tests * Parallelize on every docstore * Fix schema and specify modality in base retriever suite * tests * Add first image tests * remove comment * Revert to simpler tests * Update docs/_src/api/api/primitives.md Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/__init__.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * get_args * mypy * Update haystack/modeling/model/multimodal/__init__.py * Update haystack/modeling/model/multimodal/base.py * Update haystack/modeling/model/multimodal/base.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/sentence_transformers.py * Update haystack/modeling/model/multimodal/sentence_transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/transformers.py * Update haystack/modeling/model/multimodal/transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/modeling/model/multimodal/transformers.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * Update haystack/nodes/retriever/multimodal/retriever.py Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> * mypy * mypy * removing more ContentTypes * more contentypes * pylint * add to __init__ * revert end2end workflow for now * missing integration markers * Update haystack/nodes/retriever/multimodal/embedder.py Co-authored-by: bogdankostic <bogdankostic@web.de> * review feedback, removing HaystackImageTransformerModel * review feedback part 2 * mypy & pylint * mypy * mypy * fix multimodal docs also for Pinecone * add note on internal constants * Fix pinecone write_documents * schemas * keep support for sentence-transformers only * fix pinecone test * schemas * fix pinecone again * temporarily disable some tests, need to understand if they're still relevant Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com> Co-authored-by: bogdankostic <bogdankostic@web.de>	2022-10-17 18:58:35 +02:00
Stefano Fiorucci	b579b9d54a	bug: make `ElasticSearchDocumentStore` use `batch_size` in `get_documents_by_id` (#3166 ) * use batch_size * try to fix git mess * improve docstrings * fix	2022-09-26 13:21:59 +02:00
tstadel	05a86b9d3d	feat: FAISS in OpenSearch: Support HNSW for cosine (#3217 ) * support cosine similiarity with faiss * update docs * update api docs * fix tests * Revert "update api docs" This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143. * fix api docs * collapse test * rename similairity to space_type mappings * only normalize for faiss * fix merge * fix docs normalization * get rid of List[np.array] * update docs * fix tests and tutorials * fix mypy * fix mypy * fix mypy again * again mypy * blacken * update tutorial 4 docs * fix embeddingretriever * fix faiss * move dense specific logic to DenseRetriever * fix mypy * cosine tests for all documents stores * fix pinecone * add docstring * docstring corrections * update docs * add integration test marker * docstrings update * update docs * fix typo * update docs * fix MockDenseRetriever * run integration tests for all documentstores * fix test_update_embeddings_cosine_similarity * fix faiss tests not running * blacken * make test_cosine_sanity_check integration test * split PR * update docs * manually revert tutorial doc change * Fix embedding type * set integration marker correctly * make BaseDocumentStore.normalize_embedding static * format * fix handling of opensearch_faiss param * fix merge * add DenseRetriever typing * organize imports in conftest.py * organize imports in conftest.py (2) * fix DenseRetriever import * add opensearch-tests-linux	2022-09-23 13:26:49 +02:00
tstadel	b10e2c392e	chore: add `DenseRetriever` abstraction (#3252 ) * support cosine similiarity with faiss * update docs * update api docs * fix tests * Revert "update api docs" This reverts commit 6138fdfefb3beaee2d55c5729cd4a2745ea6b143. * fix api docs * collapse test * rename similairity to space_type mappings * only normalize for faiss * fix merge * fix docs normalization * get rid of List[np.array] * update docs * fix tests and tutorials * fix mypy * fix mypy * fix mypy again * again mypy * blacken * update tutorial 4 docs * fix embeddingretriever * fix faiss * move dense specific logic to DenseRetriever * fix mypy * cosine tests for all documents stores * fix pinecone * add docstring * docstring corrections * update docs * add integration test marker * docstrings update * update docs * fix typo * update docs * fix MockDenseRetriever * run integration tests for all documentstores * fix test_update_embeddings_cosine_similarity * fix faiss tests not running * blacken * make test_cosine_sanity_check integration test * update docs * fix imports * import DenseRetriever normally * update docs * fix deepcopy of documents * update schema * Revert "update schema" This reverts commit 83cf8f323648468e1c322d54852bec084d637e3f. * fix schema for ci manually	2022-09-21 19:08:54 +02:00
Branden Chan	492a8046d8	docs: sync Haystack API with Readme (#3223 ) * First pass at syncing Haystack API with Readme * Reapply changes * Regularize slugs * Regularize slugs * Regularize slugs * Set category id and regen * Trigger workflow * Delete old md files * Test sync * Undo test string * Incorporate reviewer feedback * Test on the fly API generation and sync * Test on the fly API generation and sync * Test on the fly API generation and sync * Test on the fly API generation and sync * Test on the fly API generation and sync * Change name of pydoc-markdown scripts * Test on the fly API generation and sync * Remove version tag * Test version tag * Test version tag * Test version tag * Revert test docstring * Revert md file changes * Revert md file changes * Revert script naming * Test on the fly generation and sync * Adjust for on the fly generation and sync * Revert test string * Remove old documentation workflow * Set workflow to work on main * Change readme version name	2022-09-21 17:18:34 +02:00
Massimiliano Pippi	8f76d64f6f	chore: bump release number for unstable version (#3251 ) * bump version for unstable * allow generation of rc schemas * update schemas	2022-09-21 16:58:06 +02:00
Vladimir Blagojevic	938e6fda5b	Classify pipeline's type based on its components (#3132 ) * Add pipeline get_type mehod * Add pipeline uptime * Add pipeline telemetry event sending * Send pipeline telemetry once a day (at most) * Add pipeline invocation counter, change invocation counter logic * Update allowed telemetry parameters - allow pipeline parameters * PR review: add unit test	2022-09-21 14:53:42 +02:00
Stefano Fiorucci	89247b804c	refactor: make `TransformersDocumentClassifier` output consistent between different types of classification (#3224 ) * make output consistent * make output consistent * added tests for details * better tests * Update test_document_classifier.py * make black happy * Update test_document_classifier.py * Update test_document_classifier.py	2022-09-21 13:16:03 +02:00
Tuana Celik	336c144e72	chore: updating colab links in older docs versions (#3250 ) * updating colab links to tutorial 1 * remaining tutorials Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2022-09-20 18:15:29 +02:00
Malte Pietsch	7e79a48540	bug: reactivate benchmarks with quick fixes (#2766 ) * quick fix benchmark runs to make them work with current haystack version * fix minor typo * update readme. fix minor things to make benchmarks run again * Update Documentation & Code Style * fix typo in readme * update result files for reader and retriever querying * reduce batch size for update embeddings to prevent xlarge bulk_update requests that exceed elastic's limits (happening in dense 500k runs) * change default memory allocation back to normal. add note to readme * add first indexing results * add memory to docker cmd * full benchmarks results on commit c5a2651fcbbeffca06ffa9036b10e62669bcc1b0 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-09-20 10:22:08 +02:00
Massimiliano Pippi	9399ddf949	fix pydoc-markdown hook (#3238 )	2022-09-19 18:20:35 +02:00
Massimiliano Pippi	8fbccbda82	fix: handle Documents containing dataframes in Multilabel constructor (#3237 ) * format * fix docs	2022-09-19 14:59:20 +02:00

1 2 3 4 5 ...

537 Commits