21 Commits

Author SHA1 Message Date
Timo Moeller
6892955e95
Add execute permissions (#1666) 2021-10-27 17:35:34 +02:00
Timo Moeller
6da2c73611
Add nltk download, add folder for file upload (#1633) 2021-10-22 16:03:33 +02:00
Malte Pietsch
bb9ec90d3c
Fix tesseract installation in Dockerfile (#1405)
* Fix Dockerfile

* Update Dockerfile-GPU
2021-09-02 11:09:30 +02:00
Shahrukh Khan
4822536886
Add ImageToTextConverter and PDFToTextOCRConverter that utilize OCR (#1349)
* add image.py converter

* add PDFtoImageConverter

* add init to PDFtoImageConverter and classes to __init__

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* update imagetotext pipeline

* revert change in base.py in file_conv

* Update base.py

* Update pdf.py

* add ocr file_converter testcase & update dockerfile

* fix tesseract exception message typo

* fix _image_to_text doctstring

* add tesseract installation to CI

* add tesseract installation to CI

* add content test for PDF OCR converter

* update PDFToTextOCRConverter constructor doctsring

* replace image files with tmp paths for image.py convert

* replace image files with tmp paths for image.py convert

* Update README.md

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-09-01 16:42:25 +02:00
Alvise Sembenico
6326cf5710
🐳 add PDF converter dependencies to Docker (#1107) 2021-05-31 19:01:02 +02:00
oryx1729
406f7fa679
Disable Gunicorn preload option (#960) 2021-04-12 12:46:52 +02:00
oryx1729
6d00eff796
Add PDF converter in Dockerfiles (#877) 2021-03-08 09:55:11 +01:00
Malte Pietsch
46530e86f8
Fix sentencepiece dependency in dockerfiles (#553) 2020-11-05 12:01:27 +01:00
Guillim
7a43d1a72d
Update readme path in Dockerfile (#537)
* Update Dockerfile

forgot to change the extension i believe

* Update Dockerfile

* Update Dockerfile-GPU
2020-11-03 10:19:18 +01:00
Malte Pietsch
a92ca04648
Update GPU docker & fix race condition with multiple workers (#436)
* fix gpu CMD and set tag to latest

* udpate dockerfiles. resolve race condition of index creation with multiple workers

* update dockerfiles for preload. remove try catch for elastic index creation

* add back try/catch. disable multiproc in default config to comply with --preload of gunicorn

* change to pip3 for GPU dockerfile

* remove --preload for gpu
2020-09-29 21:12:44 +02:00
Malte Pietsch
9727829cc6
Rename and restructure modules (database, indexing, schemas) (#379)
* rename database to documentstore

* move document, label, multilabel to haystack/schema.py

* rename documentstore -> document_store

* split indexing modules -> file_converter + preprocessor

* fix order of imports

* Update tutorial notebooks

* fix torch version in tutorial 4
2020-09-16 18:33:23 +02:00
Malte Pietsch
4da480aa15 Fix dockerfiles 2020-07-16 15:58:49 +02:00
Guillim
c45d54959f
Fix Dockerfile to build successfully without models directory (#210) 2020-07-08 17:12:20 +02:00
Guillim
8a616dae75
Adjust Docker and REST API to allow TransformsReader Class (#180) 2020-07-07 16:25:36 +02:00
Guillim
27b8c98227
Fix rest api in Docker image after refactoring (#178) 2020-06-26 17:52:46 +02:00
Tanay Soni
51a3851f93
Update Dockerfiles to use Gunicorn for deployment (#69) 2020-04-21 16:14:51 +02:00
Malte Pietsch
76c5c1d6aa
Improve deployment of REST API (Configs, logging, minor bugs) (#40)
* remove env variables from dockerfiles

* add more config options to rest api. make fields optional. change to elasticsearch as default

* skip reader if retriever doesn't return anything

* add more config params to farm reader. fix top_k_per_sample

* update FARM version
2020-03-18 12:26:13 +01:00
Malte Pietsch
2164e8550f
Add gpu dockerfile, improve logging, fix minor bug with filtering (#36)
* add gpu dockerfile. improve logging. fix minor bug with filtering

* fix path
2020-03-12 18:30:42 +01:00
Malte Pietsch
eee2676cb0 update docker for fastAPI 2020-02-28 17:49:08 +01:00
Malte Pietsch
3367b46348 switch name from farm_haystack to haystack 2019-11-27 13:56:03 +01:00
Tanay Soni
f5921548ba Initial Commit 2019-11-14 11:42:51 +01:00