130 lines
4.2 KiB
Python
Raw Normal View History

from typing import Dict, Any
import logging
2020-07-07 12:28:41 +02:00
import time
import json
from pathlib import Path
from numpy import ndarray
from fastapi import APIRouter
import haystack
Refactoring of the `haystack` package (#1624) * Files moved, imports all broken * Fix most imports and docstrings into * Fix the paths to the modules in the API docs * Add latest docstring and tutorial changes * Add a few pipelines that were lost in the inports * Fix a bunch of mypy warnings * Add latest docstring and tutorial changes * Create a file_classifier module * Add docs for file_classifier * Fixed most circular imports, now the REST API can start * Add latest docstring and tutorial changes * Tackling more mypy issues * Reintroduce from FARM and fix last mypy issues hopefully * Re-enable old-style imports * Fix some more import from the top-level package in an attempt to sort out circular imports * Fix some imports in tests to new-style to prevent failed class equalities from breaking tests * Change document_store into document_stores * Update imports in tutorials * Add latest docstring and tutorial changes * Probably fixes summarizer tests * Improve the old-style import allowing module imports (should work) * Try to fix the docs * Remove dedicated KnowledgeGraph page from autodocs * Remove dedicated GraphRetriever page from autodocs * Fix generate_docstrings.sh with an updated list of yaml files to look for * Fix some more modules in the docs * Fix the document stores docs too * Fix a small issue on Tutorial14 * Add latest docstring and tutorial changes * Add deprecation warning to old-style imports * Remove stray folder and import Dict into dense.py * Change import path for MLFlowLogger * Add old loggers path to the import path aliases * Fix debug output of convert_ipynb.py * Fix circular import on BaseRetriever * Missed one merge block * re-run tutorial 5 * Fix imports in tutorial 5 * Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base * Add latest docstring and tutorial changes * Fix typo in utils __init__ * Fix a few more imports * Fix benchmarks too * New-style imports in test_knowledge_graph * Rollback setup.py * Rollback squad_to_dpr too Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-25 15:50:23 +02:00
from haystack.pipelines.base import Pipeline
from rest_api.config import PIPELINE_YAML_PATH, QUERY_PIPELINE_NAME
from rest_api.config import LOG_LEVEL, CONCURRENT_REQUEST_PER_WORKER
from rest_api.schema import QueryRequest, QueryResponse
from rest_api.controller.utils import RequestLimiter
logging.getLogger("haystack").setLevel(LOG_LEVEL)
logger = logging.getLogger("haystack")
Redesign primitives - `Document`, `Answer`, `Label` (#1398) * first draft / notes on new primitives * wip label / feedback refactor * rename doc.text -> doc.content. add doc.content_type * add datatype for content * remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field * update converters for . Add warning for empty * renam label.question -> label.query. Allow sorting of Answers. * WIP primitives * update ui/reader for new Answer format * Improve Label. First refactoring of MultiLabel. Adjust eval code * fixed workflow conflict with introducing new one (#1472) * Add latest docstring and tutorial changes * make add_eval_data() work again * fix reader formats. WIP fix _extract_docs_and_labels_from_dict * fix test reader * Add latest docstring and tutorial changes * fix another test case for reader * fix mypy in farm reader.eval() * fix mypy in farm reader.eval() * WIP ORM refactor * Add latest docstring and tutorial changes * fix mypy weaviate * make label and multilabel dataclasses * bump mypy env in CI to python 3.8 * WIP refactor Label ORM * WIP refactor Label ORM * simplify tests for individual doc stores * WIP refactoring markers of tests * test alternative approach for tests with existing parametrization * WIP refactor ORMs * fix skip logic of already parametrized tests * fix weaviate behaviour in tests - not parametrizing it in our general test cases. * Add latest docstring and tutorial changes * fix some tests * remove sql from document_store_types * fix markers for generator and pipeline test * remove inmemory marker * remove unneeded elasticsearch markers * add dataclasses-json dependency. adjust ORM to just store JSON repr * ignore type as dataclasses_json seems to miss functionality here * update readme and contributing.md * update contributing * adjust example * fix duplicate doc handling for custom index * Add latest docstring and tutorial changes * fix some ORM issues. fix get_all_labels_aggregated. * update drop flags where get_all_labels_aggregated() was used before * Add latest docstring and tutorial changes * add to_json(). add + fix tests * fix no_answer handling in label / multilabel * fix duplicate docs in memory doc store. change primary key for sql doc table * fix mypy issues * fix mypy issues * haystack/retriever/base.py * fix test_write_document_meta[elastic] * fix test_elasticsearch_custom_fields * fix test_labels[elastic] * fix crawler * fix converter * fix docx converter * fix preprocessor * fix test_utils * fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations * Add latest docstring and tutorial changes * fix crawler test. fix ocrconverter attribute * fix test_elasticsearch_custom_query * fix generator pipeline * fix ocr converter * fix ragenerator * Add latest docstring and tutorial changes * fix test_load_and_save_yaml for elasticsearch * fixes for pipeline tests * fix faq pipeline * fix pipeline tests * Add latest docstring and tutorial changes * fix weaviate * Add latest docstring and tutorial changes * trigger CI * satisfy mypy * Add latest docstring and tutorial changes * satisfy mypy * Add latest docstring and tutorial changes * trigger CI * fix question generation test * fix ray. fix Q-generation * fix translator test * satisfy mypy * wip refactor feedback rest api * fix rest api feedback endpoint * fix doc classifier * remove relation of Labels -> Docs in SQL ORM * fix faiss/milvus tests * fix doc classifier test * fix eval test * fixing eval issues * Add latest docstring and tutorial changes * fix mypy * WIP replace dataclasses-json with manual serialization * Add latest docstring and tutorial changes * revert to dataclass-json serialization for now. remove debug prints. * update docstrings * fix extractor. fix Answer Span init * fix api test * keep meta data of answers in reader.run() * fix meta handling * adress review feedback * Add latest docstring and tutorial changes * make document=None for open domain labels * add import * fix print utils * fix rest api * adress review feedback * Add latest docstring and tutorial changes * fix mypy Co-authored-by: Markus Paff <markuspaff.mp@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-13 14:23:23 +02:00
from pydantic import BaseConfig
BaseConfig.arbitrary_types_allowed = True
router = APIRouter()
PIPELINE = Pipeline.load_from_yaml(Path(PIPELINE_YAML_PATH), pipeline_name=QUERY_PIPELINE_NAME)
DOCUMENT_STORE = PIPELINE.get_document_store()
logging.info(f"Loaded pipeline nodes: {PIPELINE.graph.nodes.keys()}")
concurrency_limiter = RequestLimiter(CONCURRENT_REQUEST_PER_WORKER)
logging.info("Concurrent requests per worker: {CONCURRENT_REQUEST_PER_WORKER}")
@router.get("/initialized")
def check_status():
"""
This endpoint can be used during startup to understand if the
server is ready to take any requests, or is still loading.
The recommended approach is to call this endpoint with a short timeout,
like 500ms, and in case of no reply, consider the server busy.
"""
return True
@router.get("/hs_version")
def haystack_version():
"""
Get the running Haystack version.
"""
return {"hs_version": haystack.__version__}
@router.post("/query", response_model=QueryResponse, response_model_exclude_none=True)
def query(request: QueryRequest):
"""
This endpoint receives the question as a string and allows the requester to set
additional parameters that will be passed on to the Haystack pipeline.
"""
with concurrency_limiter.run():
result = _process_request(PIPELINE, request)
return result
def _process_request(pipeline, request) -> Dict[str, Any]:
start_time = time.time()
params = request.params or {}
# format global, top-level filters (e.g. "params": {"filters": {"name": ["some"]}})
if "filters" in params.keys():
params["filters"] = _format_filters(params["filters"])
# format targeted node filters (e.g. "params": {"Retriever": {"filters": {"value"}}})
for key in params.keys():
if "filters" in params[key].keys():
params[key]["filters"] = _format_filters(params[key]["filters"])
result = pipeline.run(query=request.query, params=params, debug=request.debug)
# Ensure answers and documents exist, even if they're empty lists
if not "documents" in result:
result["documents"] = []
if not "answers" in result:
result["answers"] = []
# if any of the documents contains an embedding as an ndarray the latter needs to be converted to list of float
for document in result["documents"]:
if isinstance(document.embedding, ndarray):
document.embedding = document.embedding.tolist()
logger.info(
json.dumps({"request": request, "response": result, "time": f"{(time.time() - start_time):.2f}"}, default=str)
)
return result
def _format_filters(filters):
"""
Adjust filters to compliant format:
Put filter values into a list and remove filters with null value.
"""
new_filters = {}
if filters is None:
logger.warning(
f"Request with deprecated filter format ('\"filters\": null'). "
f"Remove empty filters from params to be compliant with future versions"
)
else:
for key, values in filters.items():
if values is None:
logger.warning(
f"Request with deprecated filter format ('{key}: null'). "
f"Remove null values from filters to be compliant with future versions"
)
continue
Pylint: solve or silence locally rare warnings (#2170) * Remove invalid-envvar-default and logging-too-many-args * Remove import-self, access-member-before-definition and deprecated-argument * Remove used-before-assignment by restructuring type import * Remove unneeded-not * Silence unnecessary-lambda (it's necessary) * Remove pointless-string-statement * Update Documentation & Code Style * Silenced unsupported-membership-test (probably a real bug, can't fix though) * Remove trailing-newlines * Remove super-init-not-called and slience invalid-sequence-index (it's valid) * Remove invalid-envvar-default in ui * Remove some more warnings from pyproject.toml than actually solrted in code, CI will fail * Linting all modules together is more readable * Update Documentation & Code Style * Typo in pylint disable comment * Simplify long boolean statement * Simplify init call in FAISS * Fix inconsistent-return-statements * Fix useless-super-delegation * Fix useless-else-on-loop * Fix another inconsistent-return-statements * Move back pylint disable comment moved by black * Fix consider-using-set-comprehension * Fix another consider-using-set-comprehension * Silence non-parent-init-called * Update pylint exclusion list * Update Documentation & Code Style * Resolve unnecessary-else-after-break * Fix superfluous-parens * Fix no-else-break * Remove is_correctly_retrieved along with its pylint issue * Update exclusions list * Silence constructor issue in squad_data.py (method is already broken) * Fix too-many-return-statements * Fix use-dict-literal * Fix consider-using-from-import and useless-object-inheritance * Update exclusion list * Fix simplifiable-if-statements * Fix one consider-using-dict-items * Fix another consider-using-dict-items * Fix a third consider-using-dict-items * Fix last consider-using-dict-items * Fix three use-a-generator * Silence import errors on numba, tensorboardX and apex, but add comments & logs * Fix couple of mypy issues * Fix another typing issue * Silence mypy, was conflicting with more meaningful pylint issue * Fix no-else-continue * Silence unsubscriptable-object and fix an import error with importlib.metadata * Update Documentation & Code Style * Fix all no-else-raise * Update Documentation & Code Style * Fix inverted parameters in simplified if switch * Change [test] to [all] in some jobs (for typing and linting) * Add comment in haystack/schema.py on pydantic's dataclasses * Move comment from get_documents_by_id into _convert_weaviate_result_to_document in weaviate.py * Add comment on pylint silencing * Fix bug introduced rest_api/controller/search.py * Update Documentation & Code Style * Add ADR about Pydantic dataclasses * Update pydantic-dataclasses.md * Add link to Pydantic docs on Dataclasses Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-02-21 20:16:14 +01:00
if not isinstance(values, list):
logger.warning(
f"Request with deprecated filter format ('{key}': {values}). "
f"Change to '{key}':[{values}]' to be compliant with future versions"
)
values = [values]
new_filters[key] = values
return new_filters