387 Commits

Author SHA1 Message Date
Jake Poznanski
ae7efd3580 Refactoring 2025-02-27 13:15:33 -08:00
Jake Poznanski
9e019f17b5 More factoring 2025-02-27 13:11:47 -08:00
“aman-17”
158be488c9 added viewer for gemini vs chatgpt 2025-02-27 11:52:08 -08:00
“aman-17”
7fbca7c766 update 2025-02-26 14:03:51 -08:00
“aman-17”
98f376630a restored the fine-tuning prompt 2025-02-26 13:36:20 -08:00
“aman-17”
9481b29da3 update 2025-02-26 13:28:01 -08:00
Jake Poznanski
bd08fdb476 fixes missing OSS code for Issue #36 2025-02-26 17:49:04 +00:00
“aman-17”
88910e20fa updated gemini 2025-02-26 09:42:35 -08:00
“aman-17”
9a9f9cbddb added gemini and claude 2025-02-25 16:57:39 -08:00
“aman-17”
3a6df83168 update 2025-02-25 14:41:48 -08:00
Jake Poznanski
d4b902cea2 Olmocr runner implemented 2025-02-25 14:25:02 -08:00
Jake Poznanski
aac0c1503d chatgpt converter 2025-02-25 13:46:36 -08:00
Jake Poznanski
8a6e8b965f Basic rule viewer 2025-02-25 13:11:54 -08:00
aman-17
0130a970c2 fixed style 2025-02-25 08:57:02 -08:00
Jake Poznanski
813a355f44 Fixing mineru runner, added a few sample docs 2025-02-24 11:39:38 -08:00
Jake Poznanski
cc1f476b3e Bugfixes 2025-02-21 14:56:32 -08:00
Jake Poznanski
9da1f92628 Cleaner implementations of benchmark stuff 2025-02-21 14:08:24 -08:00
Jake Poznanski
53494d9c7e Refactoring 2025-02-21 12:47:24 -08:00
Jake Poznanski
ff465f7f36 Starting refactor 2025-02-21 11:58:39 -08:00
Jake Poznanski
a348cd6e8f olmocr bench runner 2025-02-21 09:57:07 -08:00
Jake Poznanski
c20e3c0702 Pdf for dataset 2025-02-21 09:31:25 -08:00
Jake Poznanski
16a32445a2 olmocr running 2025-02-19 16:10:46 -08:00
Jake Poznanski
422d08f4b8 Adding more rules and seeing how they should work 2025-02-19 15:13:19 -08:00
Jake Poznanski
f2f761973c Adding mineru script 2025-02-19 15:07:47 -08:00
Jake Poznanski
e5a80c572c Fixing up benchmark a bit 2025-02-19 14:43:47 -08:00
Jake Poznanski
c3d0ce99f2 Some readmes and instructions 2025-02-19 13:25:31 -08:00
Jake Poznanski
4e0339f965 Runner for olmocr bench 2025-02-19 21:04:49 +00:00
Jake Poznanski
a8f6921dd3 Benchmark runners for other systems 2025-02-19 19:50:26 +00:00
Jake Poznanski
318abf22ad Adding runbench 2025-02-19 19:27:08 +00:00
Jake Poznanski
1230aefe98 Making progress 2025-02-19 18:59:51 +00:00
Jake Poznanski
072bc1d142 Making some progress 2025-02-19 18:48:02 +00:00
Jake Poznanski
823629d046 Sample code for olmocrbench 2025-02-19 18:35:55 +00:00
Jake Poznanski
9e62003727 Adding readme for olmocr bench 2025-02-18 23:40:38 +00:00
Jake Poznanski
a2c0887b3f Bump version to v0.1.58 for release 2025-02-15 00:16:07 +00:00
Jake Poznanski
c95343d4a1 Bump version to v0.1.57 for release 2025-02-14 22:57:51 +00:00
Jake Poznanski
c4303074e6 Bump version to v0.1.56 for release 2025-02-14 22:27:44 +00:00
Jake Poznanski
bcf967b105 Bump version to v0.1.55 for release 2025-02-14 22:09:43 +00:00
Jake Poznanski
7e02e199ba Adjusting tools to include html templates 2025-02-14 21:42:59 +00:00
Jake Poznanski
229da8cb17 unused imports 2025-02-14 19:54:48 +00:00
Jake Poznanski
32aa359458 Formatting fix 2025-02-14 19:50:19 +00:00
Jake Poznanski
6583fb641a hfupload scripts 2025-02-14 17:36:00 +00:00
Jake Poznanski
8297955290 Making my parquets 2025-02-14 00:02:07 +00:00
Jake Poznanski
51cfdbd64f Better converter 2025-02-13 22:30:20 +00:00
Jake Poznanski
87cb9573d8 First pass at dataset builder script 2025-02-11 18:38:41 +00:00
Jake Poznanski
6ed6f85c42 Generating parquets for hugging face 2025-02-10 23:12:38 +00:00
Jake Poznanski
84c0c71393 Merge branch 'main' of https://github.com/allenai/olmocr 2025-02-10 22:00:42 +00:00
Jake Poznanski
7d67a59c31 Remove unused 2025-02-10 22:00:40 +00:00
Jake Poznanski
f04d1207a5 Merge branch 'main' of https://github.com/allenai/olmocr into main 2025-02-10 12:40:29 -08:00
Jake Poznanski
e73ff9d7a1 Updating to new model name on HF 2025-02-10 12:39:49 -08:00
aman-17
f57c6f3f7b restored modeling_molmo.py file 2025-02-10 11:07:35 -08:00