mirror of
https://github.com/deepset-ai/haystack.git
synced 2026-01-08 13:06:29 +00:00
* add fetch_data_from_url to extract data and store as files * corrected a typo * corrected variable name error * correction of urlparse error * type error * added selenium, urllib to requirements * removed urllib * minor changes and added function to find out inpage navigation links * quick duplicate links fix * quick type annotation fix * created seperate module for crawler * type error fix * type error fix * import fix * quick type error fix * addee return description * updated include type to list * refactor modules. Add Crawler class. rename params. * add basic pipeline compatibility * update docstrings * fix mypy issues * update args, docstrings, return filepaths * fix mypy * make urls optional in init Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
30 lines
683 B
Plaintext
30 lines
683 B
Plaintext
farm==0.6.2
|
|
--find-links=https://download.pytorch.org/whl/torch_stable.html
|
|
fastapi
|
|
uvicorn
|
|
gunicorn
|
|
pandas
|
|
sklearn
|
|
psycopg2-binary; sys_platform != 'win32' and sys_platform != 'cygwin'
|
|
elasticsearch>=7.7,<=7.10
|
|
elastic-apm
|
|
tox
|
|
coverage
|
|
langdetect # for PDF conversions
|
|
# optional: sentence-transformers
|
|
python-multipart
|
|
python-docx
|
|
sqlalchemy_utils
|
|
# for using FAISS with GPUs, install faiss-gpu
|
|
faiss-cpu>=1.6.3
|
|
tika
|
|
uvloop==0.14; sys_platform != 'win32' and sys_platform != 'cygwin'
|
|
httptools
|
|
nltk
|
|
more_itertools
|
|
networkx
|
|
# Refer milvus version support matrix at https://github.com/milvus-io/pymilvus#install-pymilvus
|
|
pymilvus
|
|
# Optional: For crawling
|
|
#selenium
|
|
#webdriver-manager |