1751 Commits

Author SHA1 Message Date
Jake Poznanski
6c32ff2c7d Update dates 2025-10-16 18:21:18 +00:00
Jake Poznanski
b1b29b2206 Cleanup of paths 2025-10-15 21:31:32 +00:00
Jake Poznanski
80f18cc2bc Fixes 2025-10-15 21:14:53 +00:00
Jake Poznanski
5695e46a21 Adding docs, refactoring how urls are pased in 2025-10-15 21:12:15 +00:00
Jake Poznanski
ab7b02a431 Readme updates 2025-10-15 20:07:02 +00:00
Jake Poznanski
b44c30b482 Starting to add support for parasail 2025-10-15 20:02:41 +00:00
Jake Poznanski
e2a5d9f8f3 Cleaning up dependencies 2025-10-15 19:40:58 +00:00
Jake Poznanski
569311c461 Workspace stuff 2025-10-14 22:54:03 +00:00
Jake Poznanski
36d6228ffa Prepare workspace fix 2025-10-14 22:48:59 +00:00
Jake Poznanski
05d85264ca Cleaning up some table test creation stuff, but it's still not great 2025-10-14 20:20:24 +00:00
Jake Poznanski
08a7c32b62 A few more fixes 2025-10-14 18:58:36 +00:00
Jake Poznanski
654fdc3271 Adjusting step 0 filtering 2025-10-14 18:14:34 +00:00
Jake Poznanski
da1607c0c0 Refinement 2025-10-14 18:12:45 +00:00
Jake Poznanski
93e8a0663d Adding meta tags to head with git version, also filtering out badly rotated docs 2025-10-14 16:30:03 +00:00
Jake Poznanski
a17aa6f94d Fixing up some things with mine_html_templates 2025-10-14 16:07:50 +00:00
Jake Poznanski
52c6dcd523 More reliable mine html templates 2025-10-13 22:27:34 +00:00
Jake Poznanski
aa239eb34c Lints 2025-10-13 21:15:19 +00:00
Jake Poznanski
369fd4d23a Adjusting some things 2025-10-13 21:14:53 +00:00
Jake Poznanski
9480508642 Mineru 2025-10-13 20:47:52 +00:00
Jake Poznanski
417fbed4ad Fix 2025-10-13 19:46:27 +00:00
Jake Poznanski
7d6db61446 Mineru runner 2025-10-13 19:43:39 +00:00
Jake Poznanski
7487e3673a More graceful tar extraction 2025-10-13 17:27:45 +00:00
Jake Poznanski
5b81bc61c6 Filtering downloads 2025-10-13 17:22:57 +00:00
Jake Poznanski
b86e3071da More bench results 2025-10-13 16:37:08 +00:00
Jake Poznanski
62faa003d3 Fix for some corrupted data 2025-10-10 22:34:32 +00:00
Jake Poznanski
fc4934c9b4 URL packaging 2025-10-10 16:52:42 +00:00
Jake Poznanski
87a2b8a9a3 More lint fixes 2025-10-09 22:16:46 +00:00
Jake Poznanski
875337f962 Lints 2025-10-09 22:12:19 +00:00
Jake Poznanski
702c42f8e7 Packaging working better now 2025-10-09 22:12:02 +00:00
Jake Poznanski
557bb9a5e9 Repackager is still not working right 2025-10-09 22:01:01 +00:00
Jake Poznanski
4c21e15d0e Packaging and repackaging test works 2025-10-09 21:52:05 +00:00
Jake Poznanski
9f4a2d4177 Tests 2025-10-09 21:42:32 +00:00
Jake Poznanski
35fc9ca025 Testing the packager 2025-10-09 21:30:38 +00:00
Jake Poznanski
74eb910b95 Now you can just run pytest . cleanly 2025-10-09 20:31:28 +00:00
Jake Poznanski
f01f7183e4 Test fixes 2025-10-09 20:28:29 +00:00
Jake Poznanski
bc8c044dd4 Preparing olmocr mix packaging scripts 2025-10-09 20:14:43 +00:00
Jake Poznanski
743e48361c New claude sonnet, going to add multilinguage tests to olmocr bench 1025 internal version 2025-10-09 19:43:22 +00:00
Jake Poznanski
da4ada33a0 Adding miner for multilingual documents 2025-10-09 18:26:40 +00:00
Jake Poznanski
95dd21b66c GRPO Documentation 2025-10-07 20:40:10 +00:00
Jake Poznanski
1f791c4a19 Changes 2025-10-07 18:29:08 +00:00
Jake Poznanski
727b345715 Merge fix 2025-10-07 18:16:31 +00:00
Jake Poznanski
8ef68fde88 Merge branch 'main' into jakep/new_data 2025-10-07 17:44:54 +00:00
Jake Poznanski
e15615aadb Model defaults 2025-10-07 17:10:45 +00:00
Jake Poznanski
b81e40602d Readme score fixes 2025-10-06 22:59:00 +00:00
Jake Poznanski
2e3d1a0317 Comitting test script to be used in model cards for individual one-off inference 2025-10-06 22:47:06 +00:00
Jake Poznanski
c89787183a Bump version to v0.3.8 for release v0.3.8 2025-10-06 21:46:18 +00:00
Jake Poznanski
e12941a608 Version bump 2025-10-06 21:46:10 +00:00
Jake Poznanski
7fe756fe63 Formatting 2025-10-06 21:10:32 +00:00
Jake Poznanski
9c7c670f1f Bump version to v0.3.7 for release v0.3.7 2025-10-06 21:10:07 +00:00
Jake Poznanski
1951a849ec Version bump with new vllm 2025-10-06 21:10:00 +00:00