Jake Poznanski
|
fd17652d55
|
Trying to make it faster
|
2024-11-15 11:06:50 -08:00 |
|
Jake Poznanski
|
999f64dd46
|
Adding empty anchor support
|
2024-10-23 22:17:20 +00:00 |
|
Jake Poznanski
|
3c1b7de293
|
Refactoring of train dataloaders
|
2024-10-16 18:26:25 +00:00 |
|
Jake Poznanski
|
35558dbddc
|
Make the prompt hint randomly select lines
|
2024-10-16 16:05:07 +00:00 |
|
Jake Poznanski
|
6d53683001
|
More stats hopefully running faster
|
2024-10-14 21:37:14 +00:00 |
|
Jake Poznanski
|
aea3f7f1fe
|
Fix for anchor generation on pdfs with no text elements
|
2024-10-11 15:01:01 +00:00 |
|
Jake Poznanski
|
dc6440d068
|
Cleaning up anchor text to deal with abnormally long lines
|
2024-10-09 16:29:20 +00:00 |
|
Jake Poznanski
|
97291b3f6a
|
Anchor is fixed to sample text elements better
|
2024-10-08 21:51:43 +00:00 |
|
Jake Poznanski
|
c8a4d14c57
|
Adding image merging to pdf report/hint/anchor
|
2024-10-08 21:23:21 +00:00 |
|
Jake Poznanski
|
5d35461dd2
|
Fix for unicode errors in big datasets for the future
|
2024-10-07 17:01:59 +00:00 |
|
Jake Poznanski
|
73fb81ef6c
|
Review page size option, fixing mkdirs in convertsilver script
|
2024-10-02 15:53:21 +00:00 |
|
Jake Poznanski
|
6ef8226347
|
Can spit out anchor text for a gpt engine using pypdf, showing locations of images and text
|
2024-10-01 23:15:53 +00:00 |
|
Jake Poznanski
|
09e8840c56
|
coherency based anchor text
|
2024-10-01 20:19:03 +00:00 |
|
Jake Poznanski
|
28fe314539
|
prepping anchor text generation code
|
2024-10-01 19:59:48 +00:00 |
|