9 Commits

Author SHA1 Message Date
Julian Risch
3c81103db7
Remove logging config from Haystack (#2848)
* move logging config from haystack lib to application

* Update Documentation & Code Style

* config logging before importing haystack

* Update Documentation & Code Style

* add logging config to all tutorials

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-07-25 17:57:30 +02:00
Stefano Fiorucci
b01a7c2259
Add InMemoryKnowledgeGraph (#2678)
* draft for InMemoryKnowledgeGraph

* remove comments

* Update Documentation & Code Style

* fix import and signature

* Fix dependencies for in_memory_knowlede_graph

* updated tutorials

* Update Documentation & Code Style

* fix bug in notebook

* fix other notebook bug

* Update Documentation & Code Style

* improved tutorial notebook

* Update Documentation & Code Style

* better implementation of InMemoryKnowledgeGraph

* fix

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-22 19:16:33 +02:00
Sara Zan
735ffa635b
[CI refactoring] Tutorials on CI (#2547)
* Experimental Ci workflow for running tutorials

* Run on every push for now

* Not starting?

* Disabling paths temporarily

* Sort tutorials in natural order

* Install ipython

* remove ipython install

* Try running ipython with sudo

* env.pythonLocation

* Skipping tutorial2 and 9 for speed

* typo

* Use one runner per tutorial, for now

* Typo in dependend job

* Missing quotes broke scripts matrix

* Simplify setup for the tutorials, try to prevent containers conflict

* Remove needless job dependencies

* Try prevent cache issues, fix small Tut10 bug

* Missing deps for running notebook tutorials

* Create three groups of tutorials excluding the longest among them

* remove deps

* use proper bash loop

* Try with a single string

* Fix typo in echo

* Forgot do

* Typo

* Try to make the GraphDB tutorial without launching its own container

* Run notebook and script together

* Whitespace

* separate scrpits and notebooks execution

* Run notebooks first

* Try caching the GoT data before running the scripts

* add note

* fix mkdir

* Fix path

* Update Documentation & Code Style

* missing -r

* Fix folder numbering

* Run notebooks as well

* Typo in notebook command

* complete path in notebook command

* Try with TIKA_LOG_PATH

* Fix folder naming

* Do not use cached data in Tut9

* extracting the number better

* Small tweaks

* Same fix on Tut10 on the notebook

* Exclude GoT cache for tut5 too

* Remove faiss files after tutorial run

* Layout

* fix remove command

* Fix path in tut10 notebook

* Fix typo in node name in tut14

* Third block was too long, rebancing

* Reduce GoT dataset even more, why wasting time after all...

* Fix paths in tut10 again

* do git clean to make sure to cleanup everything (breaks post Python)

* Remove ES file with bad permission at the end of the run

* Split first block, takes >30mins

* take out tut15 for a moment, has an actual bug

* typo

* Forgot rm option

* Simply remove all ES files

* Improve logs of GoT reduction

* Exclude also tut16 from cache to try fix bug

* Replace ll with ls

* Reintroduce 15_TableQA

* Small regrouping

* regrouping to make the min num of runners go for about 30mins

* Add cron schedule and PR paths conditions

* Add some timing information

* Separate tutorials by diff and tutorials by cron

* temp add pull_request to tutorials nightly

* Add badge in README to keep track of the nightly tutorials run

* Remove prefixes from data folder names

* Add fetch depth to get diff with master

* Fix paths again

* typo

* Exclude long-running ones

* Typo

* Fix tutorials.yml as well

* Use head_ref

* Using an action for now

* exclude other files

* Use only the correct command to run the tutorial

* Add long running tutorials in separate runners, just for experiment

* Factor out the complex bash script

* Pass the python path to the bash script

* Fix paths

* adding log statement

* Missing dollarsign

* Resetting variable in loop

* using mini GoT dataset and improving bash script

* change dataset name

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-06-15 09:53:36 +02:00
Julian Risch
ac5617e757
Add basic telemetry features (#2314)
* add basic telemetry features

* change pipeline_config to _component_config

* Update Documentation & Code Style

* add super().__init__() calls to error classes

* make posthog mock work with python 3.7

* Update Documentation & Code Style

* update link to docs web page

* log exceptions, send event for raised HaystackErrors, refactor Path(CONFIG_PATH)

* add comment on send_event in BaseComponent.init() and fix mypy

* mock NonPrivateParameters and fix pylint undefined-variable

* Update Documentation & Code Style

* check model path contains multiple /

* add test for writing to file

* add test for en-/disable telemetry

* Update Documentation & Code Style

* merge file deletion methods and ignore pylint global statement

* Update Documentation & Code Style

* set env variable in demo to activate telemetry

* fix mock of HAYSTACK_TELEMETRY_ENABLED

* fix mypy and linter

* add CI as env variable to execution contexts

* remove threading, add test for custom error event

* Update Documentation & Code Style

* simplify config/log file deletion

* add test for final event being sent

* force writing config file in test

* make test compatible with python 3.7

* switch to posthog production server

* Update Documentation & Code Style

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-03-21 11:58:51 +01:00
Sara Zan
a59bca3661
Apply black formatting (#2115)
* Testing black on ui/

* Applying black on docstores

* Add latest docstring and tutorial changes

* Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too

* Remove comments

* Relax constraints on pydoc-markdown

* Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade

* Fix a couple of bugs

* Add a type: ignore that was missing somehow

* Give path to black

* Apply Black

* Apply Black

* Relocate a couple of type: ignore

* Update documentation

* Make Linux CI run after applying Black

* Triggering Black

* Apply Black

* Remove dependency, does not work well

* Remove manually double trailing commas

* Update documentation

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-03 13:43:18 +01:00
Sara Zan
ea3abd305b
Fix a few details of some tutorials (#1733)
* Make Tutorial10 use print instead of logs and fix a typo in Tutoria15

* Add a type check in 'print_answers'

* Add same checks to print_documents and print_questions

* Make RAGenerator return Answers instead of dictionaries

* Fix RAGenerator tests

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-11-12 16:44:28 +01:00
Sara Zan
13510aa753
Refactoring of the haystack package (#1624)
* Files moved, imports all broken

* Fix most imports and docstrings into

* Fix the paths to the modules in the API docs

* Add latest docstring and tutorial changes

* Add a few pipelines that were lost in the inports

* Fix a bunch of mypy warnings

* Add latest docstring and tutorial changes

* Create a file_classifier module

* Add docs for file_classifier

* Fixed most circular imports, now the REST API can start

* Add latest docstring and tutorial changes

* Tackling more mypy issues

* Reintroduce  from FARM and fix last mypy issues hopefully

* Re-enable old-style imports

* Fix some more import from the top-level  package in an attempt to sort out circular imports

* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests

* Change document_store into document_stores

* Update imports in tutorials

* Add latest docstring and tutorial changes

* Probably fixes summarizer tests

* Improve the old-style import allowing module imports (should work)

* Try to fix the docs

* Remove dedicated KnowledgeGraph page from autodocs

* Remove dedicated GraphRetriever page from autodocs

* Fix generate_docstrings.sh with an updated list of yaml files to look for

* Fix some more modules in the docs

* Fix the document stores docs too

* Fix a small issue on Tutorial14

* Add latest docstring and tutorial changes

* Add deprecation warning to old-style imports

* Remove stray folder and import Dict into dense.py

* Change import path for MLFlowLogger

* Add old loggers path to the import path aliases

* Fix debug output of convert_ipynb.py

* Fix circular import on BaseRetriever

* Missed one merge block

* re-run tutorial 5

* Fix imports in tutorial 5

* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base

* Add latest docstring and tutorial changes

* Fix typo in utils __init__

* Fix a few more imports

* Fix benchmarks too

* New-style imports in test_knowledge_graph

* Rollback setup.py

* Rollback squad_to_dpr too

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 15:50:23 +02:00
Branden Chan
783893c3d2
Tutorial update (#1166)
* Add header / footer

* Add Milvus example

* Generate md files

* Fix mypy CI
2021-06-11 11:09:15 +02:00
Julian Risch
d38c07e0ee
knowledge graph example (#934)
* Add knowledge graph module

* Fix type hint

* Add graph retriver module

* Change type annotations, change return format

* Add graph retriever that executes questions as sparql queries

* Linking only those entities that are in the knowledge graph

* Added logging and using relations extracted from Knowledge graph for linking

* Preventing entity linking from linking the same token to multiple entities

* Pruning triples that have no variables for select and count queries

* Support knowledge graphs with Pipelines

* Add text2sparql

* Entity linking and relation linking consider more special cases now based on evaluation on labelled data

* Separating example code from KGQA implementation

* Add eval on combined extarctive and kg questions

* Remove references to hp-test

* Add fields sparql_query and long_answer_list to metadata

* Removing modular Question2SPARQL approach

* Removing additional classes used for modular kgqa approach

* preparing lcquad data

* change graph db

* Translating namespaces in knowledge graph queries

* Creating graphdb index and loading triples from .ttl file

* Fetching graph config files, triples and model from S3

* Fix incompatibility issues with BaseGraphRetriever and BaseComponent

* Removing unused utility functions

* Adding doc strings and tutorial header

* Adding sparqlwrapper dependency

* Moving tutorial header

* Sorting tutorials by number within name of notebook

* Add latest docstring and tutorial changes

* Creating test cases for knowledge graph

* Changing knowledge graph example to harry potter

* Add latest docstring and tutorial changes

* Adapting the tutorial notebook to harry potter example

* Add GraphDB fixture for tests

* Add latest docstring and tutorial changes

* Added GraphDB docker launch to CI

* Use correct GraphDB fixture

* Check if GraphDB instance is already running

* Renaming question/query and incorporating other feedback from Timo and Tanay

* Removed type annotation

* Add latest docstring and tutorial changes

Co-authored-by: oryx1729 <oryx1729@protonmail.com>
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-08 14:05:33 +02:00