Jake Poznanski
|
2ab7cb280c
|
Removing pymupdf
|
2025-01-30 15:51:54 -08:00 |
|
Jake Poznanski
|
fb402297ce
|
Isort and black update
|
2025-01-29 15:42:34 -08:00 |
|
Jake Poznanski
|
dcaca8aa90
|
Black formatting
|
2025-01-29 15:30:39 -08:00 |
|
Jake Poznanski
|
4a1762d455
|
isort
|
2025-01-29 15:25:10 -08:00 |
|
Jake Poznanski
|
b2894d0280
|
Massive refactor from pdelfin to olmocr
|
2025-01-27 18:30:41 +00:00 |
|
Jake Poznanski
|
b7c80cd17f
|
Fix up some tests but I don't see why this isn't working
|
2024-10-10 16:58:40 +00:00 |
|
Jake Poznanski
|
09e8840c56
|
coherency based anchor text
|
2024-10-01 20:19:03 +00:00 |
|
Jake Poznanski
|
bab32aa9b3
|
Formatting
|
2024-09-18 22:52:42 +00:00 |
|
Jake Poznanski
|
d22b311340
|
Starting to write dataloader for visual lm data
|
2024-09-18 21:42:09 +00:00 |
|
Jake Poznanski
|
af2126df99
|
450tok/sec/core with smollm that appears to work well
|
2024-09-17 19:59:02 +00:00 |
|
Jake Poznanski
|
2f71cb9232
|
Using SmolLM, seems a lot better and is able to pass some tests
|
2024-09-17 18:47:27 +00:00 |
|
Jake Poznanski
|
57e80aacd2
|
Testing coherence with distilgpt2, but it doesn't work great
|
2024-09-17 16:58:45 +00:00 |
|
Jake Poznanski
|
01bc0b2f10
|
Moving a whole bunch of code over, still broken
|
2024-09-17 16:26:55 +00:00 |
|