Mallori Harrell
e0a76effff
feat: Added EmailElement
for email documents ( #103 )
...
* new EmailElement data structure
2022-12-21 16:03:44 -06:00
Matt Robinson
4f6fc29b54
fix: partition_html
should process container divs that include text ( #110 )
...
* check for containers with text
* added tests for containers with text
* changelog and version bump
2022-12-21 21:51:04 +00:00
Matt Robinson
1d68bb2482
feat: apply
method to apply cleaning bricks to elements ( #102 )
...
* add apply method to apply cleaners to elements
* bump version
* add check for string output
* documentations for the apply method
* change interface to *cleaners
2022-12-15 22:19:02 +00:00
Mallori Harrell
53fcf4e912
chore: Remove PDF parsing code and dependencies ( #75 )
...
Remove PDF parsing code and dependencies.
2022-11-21 11:47:29 -06:00
qued
9906dd23a1
fix: move _read out of base Document class
...
Changed where _read sits in the inheritance structure since PDFDocument doesn't really need lazy document processing
2022-11-14 13:34:42 -06:00
Matt Robinson
704d6e11d1
chore: Update PDFDocument to use from_file method ( #35 )
...
* update PDFDocument to use from_file method
* bump version
2022-10-13 16:04:30 +00:00
Matt Robinson
5f40c78f25
Initial Release
2022-09-26 14:55:20 -07:00