haystack/test/test_docx_conversion.py
Malte Pietsch 9727829cc6
Rename and restructure modules (database, indexing, schemas) (#379)
* rename database to documentstore

* move document, label, multilabel to haystack/schema.py

* rename documentstore -> document_store

* split indexing modules -> file_converter + preprocessor

* fix order of imports

* Update tutorial notebooks

* fix torch version in tutorial 4
2020-09-16 18:33:23 +02:00

11 lines
474 B
Python

from pathlib import Path
from haystack.file_converter.docx import DocxToTextConverter
def test_extract_pages():
converter = DocxToTextConverter()
paragraphs, _ = converter.extract_pages(file_path=Path("samples/docx/sample_docx.docx"))
assert len(paragraphs) == 8 # Sample has 8 Paragraphs
assert paragraphs[1] == 'The US has "passed the peak" on new coronavirus cases, President Donald Trump said and predicted that some states would reopen this month.'