Jake Poznanski
|
6d53683001
|
More stats hopefully running faster
|
2024-10-14 21:37:14 +00:00 |
|
Jake Poznanski
|
a90feda42f
|
bugfixes
|
2024-10-09 20:20:06 +00:00 |
|
Jake Poznanski
|
dc6440d068
|
Cleaning up anchor text to deal with abnormally long lines
|
2024-10-09 16:29:20 +00:00 |
|
Jake Poznanski
|
97291b3f6a
|
Anchor is fixed to sample text elements better
|
2024-10-08 21:51:43 +00:00 |
|
Jake Poznanski
|
c8a4d14c57
|
Adding image merging to pdf report/hint/anchor
|
2024-10-08 21:23:21 +00:00 |
|
Jake Poznanski
|
5d35461dd2
|
Fix for unicode errors in big datasets for the future
|
2024-10-07 17:01:59 +00:00 |
|
Jake Poznanski
|
b340ae5092
|
A few notes, starting to test dataloader with new structured response format
|
2024-10-02 22:17:15 +00:00 |
|
Jake Poznanski
|
0071cbd788
|
Appears as if the report method works really well, might need one last step to detect rotated pages
|
2024-10-02 16:44:39 +00:00 |
|
Jake Poznanski
|
6ef8226347
|
Can spit out anchor text for a gpt engine using pypdf, showing locations of images and text
|
2024-10-01 23:15:53 +00:00 |
|
Jake Poznanski
|
e42cecf96c
|
Adding anchor code based off of pypdf that visits each text block, hopefully so we can make it output good bboxes
|
2024-10-01 22:10:58 +00:00 |
|