mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-06-27 02:30:08 +00:00

* move partition_pdf to partition folder * move partition.py * refactor partioning bricks into partition diretory * import to nlp for backward compatibility * update docs * update version and bump changelog * fix typo in changelog * update readme reference
1.3 KiB
1.3 KiB
0.3.0-dev2
- Removing the local PDF parsing code and any dependencies and tests.
- Reorganizes the staging bricks in the unstructured.partition module
0.2.6
- Small change to how _read is placed within the inheritance structure since it doesn't really apply to pdf
- Add partitioning brick for calling the document image analysis API
0.2.5
- Update python requirement to >=3.7
0.2.4
- Add alternative way of importing
Final
to support google colab
0.2.3
- Add cleaning bricks for removing prefixes and postfixes
- Add cleaning bricks for extracting text before and after a pattern
0.2.2
- Add staging brick for Datasaur
0.2.1
- Added brick to convert an ISD dictionary to a list of elements
- Update
PDFDocument
to use thefrom_file
method - Added staging brick for CSV format for ISD (Initial Structured Data) format.
- Added staging brick for separating text into attention window size chunks for
transformers
. - Added staging brick for LabelBox.
- Added ability to upload LabelStudio predictions
- Added utility function for JSONL reading and writing
- Added staging brick for CSV format for Prodigy
- Added staging brick for Prodigy
- Added ability to upload LabelStudio annotations
- Added text_field and id_field to stage_for_label_studio signature
0.2.0
- Initial release of unstructured