haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2026-01-27 18:14:08 +00:00

History

Daniel Bichuetti 3948b997b2

Add support for custom trained PunktTokenizer in PreProcessor (#2783 )

* Add support for model folder into BasePreProcessor

* First draft of custom model on PreProcessor

* Update Documentation & Code Style

* Update tests to support custom models

* Update Documentation & Code Style

* Test for wrong models in custom folder

* Default to ISO names on custom model folder

Use long names only when needed

* Update Documentation & Code Style

* Refactoring language names usage

* Update fallback logic

* Check unpickling error

* Updated tests using parametrize

Co-authored-by:  Sara Zan <sara.zanzottera@deepset.ai>

* Refactored common logic

* Add format control to NLTK load

* Tests improvements

Add a sample for specialized model

* Update Documentation & Code Style

* Minor log text update

* Log model format exception details

* Change pickle protocol version to 4 for 3.7 compat

* Removed unnecessary model folder parameter

Changed logic comparisons

Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

* Update Documentation & Code Style

* Removed unused import

* Change errors with warnings

* Change to absolute path

* Rename sentence tokenizer method

Co-authored-by: tstadel

* Check document content is a string before process

* Change to log errors and not warnings

* Update Documentation & Code Style

* Improve split sentences method

Co-authored-by:  Sara Zan  <sara.zanzottera@deepset.ai>

* Update Documentation & Code Style

* Empty commit - trigger workflow

* Remove superfluous parameters

Co-authored-by: tstadel

* Explicit None checking

Co-authored-by: tstadel

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>

2022-07-21 09:50:45 +02:00

_static

Create documentation website (#272 )

2020-09-18 12:57:32 +02:00

_templates

Create documentation website (#272 )

2020-09-18 12:57:32 +02:00

api

Add support for custom trained PunktTokenizer in PreProcessor (#2783 )

2022-07-21 09:50:45 +02:00

img

Create documentation website (#272 )