* add fetch_data_from_url to extract data and store as files
* corrected a typo
* corrected variable name error
* correction of urlparse error
* type error
* added selenium, urllib to requirements
* removed urllib
* minor changes and added function to find out inpage navigation links
* quick duplicate links fix
* quick type annotation fix
* created seperate module for crawler
* type error fix
* type error fix
* import fix
* quick type error fix
* addee return description
* updated include type to list
* refactor modules. Add Crawler class. rename params.
* add basic pipeline compatibility
* update docstrings
* fix mypy issues
* update args, docstrings, return filepaths
* fix mypy
* make urls optional in init
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Text changes
* Add new images
* First improvements
* Next iteration
* Resize gif
* Add bold
* Update key concepts diagram
* Center image
* Initial import of a more detailed README.md
* Slight changes to ToC, requirements and across the text.
* Grammar and Streamlit UI png.
* Unfix size of gif for mobile
* Remove requirements, add formatting to numbered lists.
* Formatting, remove img size options.
* Another iteration of phrasing the note about open ports.
* Rephrase the note about the docker ports.
Co-authored-by: Andrey A <56412611+aantti@users.noreply.github.com>
* Adding translator with many generic input parameter support
* Making dict_key as generic
* Fixing mypy issue
* Adding pipeline and using opus models
* Add latest docstring and tutorial changes
* Adding test cases for end-to-end translation for generator, summerizer etc
* raise error join and merge nodes
* Fix test failure
* add docstrings. add usage documentation. rm skip_special_tokens param
* Add latest docstring and tutorial changes
* fix code snippets in md
* Adding few extra configuration parameters and fixing tests
* Fixingmypy issue and updating usage document
* fix for mypy issue in pipeline.py
* reverting renaming of pytest_collection_modifyitems method
* Addressing review comments
* setting skip_special_tokens to True
* removing model_max_length argument as None type is not supported to many models
* Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* fix encoding of pdftotext. fix version in download instructions
* fix test
* Add latest docstring and tutorial changes
* make latin-1 default encoding again
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* make dpr queries less verbose
* add progress bar flag to more components
* Add latest docstring and tutorial changes
* add type
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* new docs version
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>