haystack/setup.py
bogdankostic 655d721371
Add Table Reader (#1446)
* first draft / notes on new primitives

* wip label / feedback refactor

* rename doc.text -> doc.content. add doc.content_type

* add datatype for content

* remove faq_question_field from ES and weaviate. rename text_field -> content_field in docstores. update tutorials for content field

* update converters for . Add warning for empty

* Add first draft of TableReader

* renam label.question -> label.query. Allow sorting of Answers.

* Add calculation of answer scores

* WIP primitives

* Adapt input and output to new primitives

* Add doc strings

* Add tests

* update ui/reader for new Answer format

* Improve Label. First refactoring of MultiLabel. Adjust eval code

* fixed workflow conflict with introducing new one (#1472)

* Add latest docstring and tutorial changes

* make add_eval_data() work again

* fix reader formats. WIP fix _extract_docs_and_labels_from_dict

* fix test reader

* Add latest docstring and tutorial changes

* fix another test case for reader

* fix mypy in farm reader.eval()

* fix mypy in farm reader.eval()

* WIP ORM refactor

* Add latest docstring and tutorial changes

* fix mypy weaviate

* make label and multilabel dataclasses

* bump mypy env in CI to python 3.8

* WIP refactor Label ORM

* WIP refactor Label ORM

* simplify tests for individual doc stores

* WIP refactoring markers of tests

* test alternative approach for tests with existing parametrization

* WIP refactor ORMs

* fix skip logic of already parametrized tests

* fix weaviate behaviour in tests - not parametrizing it in our general test cases.

* Add latest docstring and tutorial changes

* fix some tests

* remove sql from document_store_types

* fix markers for generator and pipeline test

* remove inmemory marker

* remove unneeded elasticsearch markers

* add dataclasses-json dependency. adjust ORM to just store JSON repr

* ignore type as dataclasses_json seems to miss functionality here

* update readme and contributing.md

* update contributing

* adjust example

* fix duplicate doc handling for custom index

* Add latest docstring and tutorial changes

* fix some ORM issues. fix get_all_labels_aggregated.

* update drop flags where get_all_labels_aggregated() was used before

* Add latest docstring and tutorial changes

* add to_json(). add + fix tests

* fix no_answer handling in label / multilabel

* fix duplicate docs in memory doc store. change primary key for sql doc table

* fix mypy issues

* fix mypy issues

* haystack/retriever/base.py

* fix test_write_document_meta[elastic]

* fix test_elasticsearch_custom_fields

* fix test_labels[elastic]

* fix crawler

* fix converter

* fix docx converter

* fix preprocessor

* fix test_utils

* fix tfidf retriever. fix selection of docstore in tests with multiple fixtures / parameterizations

* Add latest docstring and tutorial changes

* fix crawler test. fix ocrconverter attribute

* fix test_elasticsearch_custom_query

* fix generator pipeline

* fix ocr converter

* fix ragenerator

* Add latest docstring and tutorial changes

* fix test_load_and_save_yaml for elasticsearch

* fixes for pipeline tests

* fix faq pipeline

* fix pipeline tests

* Add latest docstring and tutorial changes

* fix weaviate

* Add latest docstring and tutorial changes

* trigger CI

* satisfy mypy

* Add latest docstring and tutorial changes

* satisfy mypy

* Add latest docstring and tutorial changes

* trigger CI

* fix question generation test

* fix ray. fix Q-generation

* fix translator test

* satisfy mypy

* wip refactor feedback rest api

* fix rest api feedback endpoint

* fix doc classifier

* remove relation of Labels -> Docs in SQL ORM

* fix faiss/milvus tests

* fix doc classifier test

* fix eval test

* fixing eval issues

* Add latest docstring and tutorial changes

* fix mypy

* WIP replace dataclasses-json with manual serialization

* Add latest docstring and tutorial changes

* revert to dataclass-json serialization for now. remove debug prints.

* update docstrings

* fix extractor. fix Answer Span init

* fix api test

* Adapt answer format

* Add latest docstring and tutorial changes

* keep meta data of answers in reader.run()

* Fix mypy

* fix meta handling

* adress review feedback

* Add latest docstring and tutorial changes

* Allow inference on GPU

* Remove automatic aggregation

* Add automatic aggregation

* Add latest docstring and tutorial changes

* Add torch-scatter dependency

* Add wheel to torch-scatter dependency

* Fix requirements

* Fix requirements

* Fix requirements

* Adapt setup.py to allow for wheels

* Fix requirements

* Fix requirements

* Add type hints and code snippet

* Add latest docstring and tutorial changes

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-15 16:34:48 +02:00

89 lines
3.1 KiB
Python

import os
import re
from io import open
from setuptools import find_packages, setup
def parse_requirements(filename):
"""
Parse a requirements pip file returning the list of required packages. It exclude commented lines and --find-links directives.
Args:
filename: pip requirements requirements
Returns:
list of required package with versions constraints
"""
with open(filename) as file:
parsed_requirements = file.read().splitlines()
parsed_requirements = [line.strip()
for line in parsed_requirements
if not ((line.strip()[0] == "#") or line.strip().startswith('--find-links') or ("git+https" in line))]
return parsed_requirements
def get_dependency_links(filename):
"""
Parse a requirements pip file looking for the --find-links directive.
Args:
filename: pip requirements requirements
Returns:
list of find-links's url
"""
with open(filename) as file:
parsed_requirements = file.read().splitlines()
dependency_links = list()
for line in parsed_requirements:
line = line.strip()
if line.startswith('--find-links'):
dependency_links.append(line.split('=')[1])
return dependency_links
dependency_links = get_dependency_links('requirements.txt')
parsed_requirements = parse_requirements('requirements.txt')
def versionfromfile(*filepath):
infile = os.path.join(*filepath)
with open(infile) as fp:
version_match = re.search(
r"^__version__\s*=\s*['\"]([^'\"]*)['\"]", fp.read(), re.M
)
if version_match:
return version_match.group(1)
raise RuntimeError("Unable to find version string in {}.".format(infile))
here = os.path.abspath(os.path.dirname(__file__))
_version: str = versionfromfile(here, "haystack", "_version.py")
setup(
name="farm-haystack",
version=_version,
author="Malte Pietsch, Timo Moeller, Branden Chan, Tanay Soni",
author_email="malte.pietsch@deepset.ai",
description="Neural Question Answering & Semantic Search at Scale. Use modern transformer based models like BERT to find answers in large document collections",
long_description=open("README.md", "r", encoding="utf-8").read(),
long_description_content_type="text/markdown",
keywords="QA Question-Answering Reader Retriever semantic-search search BERT roberta albert squad mrc transfer-learning language-model transformer",
license="Apache",
url="https://github.com/deepset-ai/haystack",
download_url=f"https://github.com/deepset-ai/haystack/archive/{_version}.tar.gz",
packages=find_packages(exclude=["*.tests", "*.tests.*", "tests.*", "tests"]),
dependency_links=dependency_links,
install_requires=parsed_requirements,
python_requires=">=3.7.0",
tests_require=["pytest"],
classifiers=[
"Intended Audience :: Science/Research",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
],
)