haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2026-01-09 05:37:25 +00:00

Go to file

Adding yaml functionality to standard pipelines (save/load...) (#1735 )

* adding yaml functionality to BaseStandardPipeline

fixes #1681

* Add latest docstring and tutorial changes

* Update API Reference Pages for v1.0 (#1729)

* Create new API pages and update existing ones

* Create query classifier page

* Remove Objects suffix

* Change answer aggregation key to doc_id, query instead of label_id, query (#1726)

* Add debugging example to tutorial (#1731)

* Add debugging example to tutorial

* Add latest docstring and tutorial changes

* Remove Objects suffix

* Add latest docstring and tutorial changes

* Revert "Remove Objects suffix"

This reverts commit 6681cb06510b080775994effe6a50bae42254be4.

* Revert unintentional commit

* Add third debugging option

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix another self.device/s typo (#1734)

* Fix yet another self.device(s) typo

* Add typing to 'initialize_device_settings' to try prevent future issues

* Fix bug in Tutorial5

* Fix the same bug in the notebook

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* added test for saving and loading prebuilt pipelines

* fixed typo, changed variable name and added comments

* Add latest docstring and tutorial changes

* Fix a few details of some tutorials (#1733)

* Make Tutorial10 use print instead of logs and fix a typo in Tutoria15

* Add a type check in 'print_answers'

* Add same checks to print_documents and print_questions

* Make RAGenerator return Answers instead of dictionaries

* Fix RAGenerator tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix `print_answers` (#1743)

* Fix a specific path of print_answers that was assuming answers are dictionaries

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Split pipeline tests into three suites (#1755)

* Split pipeline tests into three suites

* Will this trigger the CI?

* Rename duplicate test into test_most_similar_documents_pipeline

* Fixing a bug that was probably never noticed

* Capitalize starting letter in params (#1750)

* Capitalize starting letter in params

Capitalized the starting letter in code examples for params in keeping with the latest names for nodes where first letter is capitalized. 
Refer: https://github.com/deepset-ai/haystack/issues/1748

* Update standard_pipelines.py

Capitalized some starting letters in the docstrings in keeping with the updated node names for standard pipelines

* Multi query eval (#1746)

* add eval() to pipeline

* Add latest docstring and tutorial changes

* support multiple queries in eval()

* Add latest docstring and tutorial changes

* keep single query test

* fix EvaluationResult node_results default

* adjust docstrings

* Add latest docstring and tutorial changes

* minor improvements from comments

* Add latest docstring and tutorial changes

* move EvaluationResult and calculate_metrics to schema

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Split summarizer tests in order to make windows CI work again (#1757)

* separate testfile for summarizer with translation

* Add latest docstring and tutorial changes

* import SPLIT_DOCS from test_summarizer

* add workflow_dispatch to windows_ci

* add worflow_dispatch to linux_ci

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix import of EvaluationResult in test case

* exclude test_summarizer_translation.py for windows_ci (#1759)

* Pipelines now tolerate custom _debug content (#1756)

* Pipelines now tolerate custom _debug content

* Support Tables in all DocumentStores (#1744)

* Add support for tables in SQLDocumentStore, FAISSDocumentStore and MilvuDocumentStore

* Add support for WeaviateDocumentStore

* Make sure that embedded meta fields are strings + add embedding_dim to WeaviateDocStore in test config

* Add latest docstring and tutorial changes

* Represent tables in WeaviateDocumentStore as nested lists

* Fix mypy

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Allow TableReader models without aggregation classifier (#1772)

* Fix usage of filters in `/query` endpoint in REST API (#1774)

* WIP filter refactoring

* fix filter formatting

* remove inplace modification of filters

* Public demo (#1747)

* Queries now run only when pressing RUN. File upload hidden. Question is not sent if the textbox is empty.

* Add latest docstring and tutorial changes

* Tidy up: remove needless state, add comments, fix minor bugs

* Had to add results to the status to avoid some bugs in eval mode

* Added 'credits'

* Add footers, update requirements, some random questions for the evaluation

* Add requested changes

* Temporary rollback the UI to the old GoT dataset

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Facilitate concurrent query / indexing in Elasticsearch with dense retrievers (new `skip_missing_embeddings` param) (#1762)

* Filtering records not having embeddings

* Added support for skip_missing_embeddings Flag. Default behavior is throw error when embeddings are missing. If skip_missing_embeddings=True then documents without embeddings are ignored for vector similarity

* Fix for below error:
haystack/document_stores/elasticsearch.py:852: error: Need type annotation for "script_score_query"

* docstring for skip_missing_embeddings parameter

* Raise exception where no documents with embeddings is found for Embedding retriever.

* Default skip_missing_embeddings to True

* Explicitly check if embeddings are present if no results are returned by EmbeddingRetriever for Elasticsearch

* Added test case for based on Julian's input

* Added test case for based on Julian's input. Fix pytest error on the testcase

* Added test case for based on Julian's input. Fix pytest error on the testcase

* Added test case for based on Julian's input. Fix pytest error on the testcase

* Simplify code by using get_embed_count

* Adjust docstring & error msg slightly

* Revert error msg

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>

* Huggingface private model support via API tokens (FARMReader) (#1775)

* passed kwargs to model loading

* Pass Auth token explicitly

* add use_auth_token to get_language_model_class

* added use_auth_token parameter at FARMReader

* Add latest docstring and tutorial changes

* added docs for parameter `use_auth_token`

* Add latest docstring and tutorial changes

* adding docs link

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* private hugging face models for retrievers (#1785)

* private dpr

* Add latest docstring and tutorial changes

* added parameters to child functions

* Add latest docstring and tutorial changes

* added tableextractor

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* ignore empty filters parameter (#1783)

* ignore empty filters parameter

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* initialize doc store with doc and label index in tutorial 5 (#1730)

* initialize doc store with doc and label index

* change ipynb according to py for tutorial 5

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Small fixes to the public demo (#1781)

* Make strealit tolerant to haystack not knowing its version, and adding special error for docstore issues

* Add workaround for a Streamlit bug

* Make default filters value an empty dict

* Return more context for each answer in the rest api

* Make the hs_version call not-blocking by adding a very quick timeout

* Add disclaimer on low confidence answer

* Use the no-answer feature of the reader to highlight questions with no good answer

* Upgrade torch to v1.10.0 (#1789)

* Upgrade torch to v1.10.0

* Adapt torch version for torch-scatter in TableQA tutorial

* Add latest docstring and tutorial changes

* Make torch version more flexible

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* adding yaml functionality to BaseStandardPipeline

fixes #1681

* Add latest docstring and tutorial changes

* added test for saving and loading prebuilt pipelines

* fixed typo, changed variable name and added comments

* Add latest docstring and tutorial changes

* fix code rendering for example

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Branden Chan <33759007+brandenchan@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: nishanthcgit <5066268+nishanthcgit@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: C V Goudar <cvgoudar@users.noreply.github.com>
Co-authored-by: Kristof Herrmann <37148029+ArzelaAscoIi@users.noreply.github.com>

2021-11-23 17:01:39 +01:00

.github

Upgrade torch to v1.10.0 (#1789 )

2021-11-23 11:49:46 +01:00

annotation_tool

Add faq annotation (#1333 )

2021-08-10 14:55:31 +02:00

docs

Adding yaml functionality to standard pipelines (save/load...) (#1735 )

2021-11-23 17:01:39 +01:00

haystack

Adding yaml functionality to standard pipelines (save/load...) (#1735 )

2021-11-23 17:01:39 +01:00

rest_api

Small fixes to the public demo (#1781 )

2021-11-22 19:06:08 +01:00

test

Adding yaml functionality to standard pipelines (save/load...) (#1735 )

2021-11-23 17:01:39 +01:00

tutorials

Fix Tutorial 11 on Google Colab (#1795 )

2021-11-23 15:35:23 +01:00

Small fixes to the public demo (#1781 )

2021-11-22 19:06:08 +01:00

.gitignore

Add /documents/get_by_filters endpoint (#1580 )

2021-10-12 10:53:54 +02:00

code_of_conduct.txt

Add code of conduct

2021-03-18 16:39:16 +01:00

CONTRIBUTING.md

Make weaviate more compliant to other doc stores (UUIDs and dummy embedddings) (#1656 )

2021-11-04 09:27:12 +01:00

docker-compose-gpu.yml

Add a restart policy on-failure to all containers

2021-10-27 17:07:36 +02:00

docker-compose.yml

Add a restart policy on-failure to all containers

2021-10-27 17:07:36 +02:00

Dockerfile

Add execute permissions (#1666 )

2021-10-27 17:35:34 +02:00

Dockerfile-GPU

Add execute permissions (#1666 )

2021-10-27 17:35:34 +02:00

LICENSE

Fix name

2021-10-12 10:22:41 +02:00

MANIFEST.in

Add MANIFEST

2019-11-27 16:20:11 +01:00

mypy.ini

Switch from dataclass to pydantic dataclass & Fix Swagger API Docs (#1598 )

2021-10-18 14:38:14 +02:00

README.md

Update README.md (#1682 )

2021-10-29 18:19:21 +02:00

requirements-dev.txt

Add sentence-transformers as mandatory dependency and remove from dev… (#1387 )

2021-09-02 09:54:13 +02:00

requirements.txt

Upgrade torch to v1.10.0 (#1789 )

2021-11-23 11:49:46 +01:00

run_docker_gpu.sh

Update tutorials (torch versions, ES version, replace Finder with Pipeline) (#814 )

2021-02-09 14:56:54 +01:00

setup.py

Refactoring of the haystack package (#1624 )

2021-10-25 15:50:23 +02:00

tox.ini

Add coverage reports and more tests (#78 )

2020-04-28 16:10:32 +02:00

README.md

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want to perform Question Answering or semantic document search, you can use the State-of-the-Art NLP models in Haystack to provide unique search experiences and allow your users to query in natural language. Haystack is built in a modular fashion so that you can combine the best technology from other open-source projects like Huggingface's Transformers, Elasticsearch, or Milvus.

What to build with Haystack

Ask questions in natural language and find granular answers in your documents.
Perform semantic search and retrieve documents according to meaning, not keywords
Use off-the-shelf models or fine-tune them to your domain.
Use user feedback to evaluate, benchmark, and continuously improve your live models.
Leverage existing knowledge bases and better handle the long tail of queries that chatbots receive.
Automate processes by automatically applying a list of questions to new documents and using the extracted answers.

Core Features

Latest models: Utilize all latest transformer-based models (e.g., BERT, RoBERTa, MiniLM) for extractive QA, generative QA, and document retrieval.
Modular: Multiple choices to fit your tech stack and use case. Pick your favorite database, file converter, or modeling framework.
Pipelines: The Node and Pipeline design of Haystack allows for custom routing of queries to only the relevant components.
Open: 100% compatible with HuggingFace's model hub. Tight interfaces to other frameworks (e.g., Transformers, FARM, sentence-transformers)
Scalable: Scale to millions of docs via retrievers, production-ready backends like Elasticsearch / FAISS, and a fastAPI REST API
End-to-End: All tooling in one place: file conversion, cleaning, splitting, training, eval, inference, labeling, etc.
Developer friendly: Easy to debug, extend and modify.
Customizable: Fine-tune models to your domain or implement your custom DocumentStore.
Continuous Learning: Collect new training data via user feedback in production & improve your models continuously


📒 Docs	Overview, Components, Guides, API documentation
💾 Installation	How to install Haystack
🎓 Tutorials	See what Haystack can do with our Notebooks & Scripts
🔰 Quick Demo	Deploy a Haystack application with Docker Compose and a REST API
🖖 Community	Slack, Twitter, Stack Overflow, GitHub Discussions
❤️ Contributing	We welcome all contributions!
📊 Benchmarks	Speed & Accuracy of Retriever, Readers and DocumentStores
🔭 Roadmap	Public roadmap of Haystack
📰 Blog	Read our articles on Medium
☎️ Jobs	We're hiring! Have a look at our open positions

💾 Installation

If you're interested in learning more about Haystack and using it as part of your application, we offer several options.

1. Installing from a package

You can install Haystack by using pip.

    pip3 install farm-haystack

Please check our page on PyPi for more information.

2. Installing from GitHub

You can also clone it from GitHub — in case you'd like to work with the master branch and check the latest features:

    git clone https://github.com/deepset-ai/haystack.git
    cd haystack
    pip install --editable .

To update your installation, do a git pull. The --editable flag will update changes immediately.

3. Installing on Windows

On Windows, you might need:

    pip install farm-haystack -f https://download.pytorch.org/whl/torch_stable.html

🎓 Tutorials

Follow our introductory tutorial to setup a question answering system using Python and start performing queries! Explore the rest of our tutorials to learn how to tweak pipelines, train models and perform evaluation.

Tutorial 1 - Basic QA Pipeline: Jupyter notebook | Colab | Python
Tutorial 2 - Fine-tuning a model on own data: Jupyter notebook | Colab | Python
Tutorial 3 - Basic QA Pipeline without Elasticsearch: Jupyter notebook | Colab | Python
Tutorial 4 - FAQ-style QA: Jupyter notebook | Colab | Python
Tutorial 5 - Evaluation of the whole QA-Pipeline: Jupyter noteboook | Colab | Python
Tutorial 6 - Better Retrievers via "Dense Passage Retrieval": Jupyter noteboook | Colab | Python
Tutorial 7 - Generative QA via "Retrieval-Augmented Generation": Jupyter noteboook | Colab | Python
Tutorial 8 - Preprocessing: Jupyter noteboook | Colab | Python
Tutorial 9 - DPR Training: Jupyter noteboook | Colab | Python
Tutorial 10 - Knowledge Graph: Jupyter noteboook | Colab | Python
Tutorial 11 - Pipelines: Jupyter noteboook | Colab | Python
Tutorial 12 - Long-Form Question Answering: Jupyter noteboook | Colab | Python
Tutorial 13 - Question Generation: Jupyter noteboook | Colab | Python
Tutorial 14 - Query Classifier: Jupyter noteboook | Colab | Python
Tutorial 15 - TableQA: Jupyter noteboook | Colab | Python

🔰 Quick Demo

Start up a Haystack service via Docker Compose. With this you can begin calling it directly via the REST API or even interact with it using the included Streamlit UI.

Click here for a step-by-step guide

1. Update/install Docker and Docker Compose, then launch Docker

    apt-get update && apt-get install docker && apt-get install docker-compose
    service docker start

2. Clone Haystack repository

    git clone https://github.com/deepset-ai/haystack.git

3. Pull images & launch demo app

    cd haystack
    docker-compose pull
    docker-compose up
    
    # Or on a GPU machine: docker-compose -f docker-compose-gpu.yml up

You should be able to see the following in your terminal window as part of the log output:

..
ui_1             |   You can now view your Streamlit app in your browser.
..
ui_1             |   External URL: http://192.168.108.218:8501
..
haystack-api_1   | [2021-01-01 10:21:58 +0000] [17] [INFO] Application startup complete.

4. Open the Streamlit UI for Haystack by pointing your browser to the "External URL" from above.

You should see the following:

You can then try different queries against a pre-defined set of indexed articles related to Game of Thrones.

Note: The following containers are started as a part of this demo:

Haystack API: listens on port 8000
DocumentStore (Elasticsearch): listens on port 9200
Streamlit UI: listens on port 8501

Please note that the demo will publish the container ports to the outside world. We suggest that you review the firewall settings depending on your system setup and the security guidelines.

🖖 Community

There is a very vibrant and active community around Haystack which we are regularly interacting with! If you have a feature request or a bug report, feel free to open an issue in Github. We regularly check these and you can expect a quick response. If you'd like to discuss a topic, or get more general advice on how to make Haystack work for your project, you can start a thread in Github Discussions or our Slack channel. We also check Twitter and Stack Overflow.

❤️ Contributing

We are very open to the community's contributions - be it a quick fix of a typo, or a completely new feature! You don't need to be a Haystack expert to provide meaningful improvements. To learn how to get started, check out our Contributor Guidelines first. You can also find instructions to run the tests locally there.

Thanks so much to all those who have contributed to our project!

Description

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

ai bert chatgpt generative-ai gpt-3 information-retrieval language-model large-language-models llm machine-learning nlp python pytorch question-answering rag retrieval-augmented-generation semantic-search squad summarization transformers

Readme Apache-2.0 128 MiB

Languages

MDX 66.3%

Python 32.3%

HTML 0.6%

JavaScript 0.5%

CSS 0.2%