185 Commits

Author SHA1 Message Date
Jake Poznanski
ec5c5b6444 Updating pareto plots 2025-06-17 21:41:23 +00:00
Jake Poznanski
6c51829ae6 Some helper scripts 2025-06-17 21:21:50 +00:00
Jake Poznanski
1295e171bb Merge branch 'main' of https://github.com/allenai/olmocr 2025-06-12 22:35:09 +00:00
Jake Poznanski
37090e2801 Go back to workers 1 in marker test script 2025-06-12 22:35:08 +00:00
Jake Poznanski
af02f15f24
Merge pull request #236 from VikParuchuri/main
Fix marker benchmarks
2025-06-12 15:24:17 -07:00
Jake Poznanski
3da6e2d587 Pareto plot update, keep cost the same for now 2025-06-12 22:23:41 +00:00
Jake Poznanski
fcd8bbec92 Install aws cli 2025-06-12 21:38:28 +00:00
Jake Poznanski
fc06797bec aws cli 2025-06-12 21:29:39 +00:00
Jake Poznanski
59e0a1ccb0 Marker wants newer torchvision 2025-06-12 21:23:53 +00:00
Jake Poznanski
0f3b45c1a3 Add time 2025-06-12 21:19:17 +00:00
Jake Poznanski
4bfcfce767 Actually install the right thing 2025-06-12 21:18:58 +00:00
Jake Poznanski
f8dfd85765 Script 2025-06-12 21:13:31 +00:00
Jake Poznanski
044874a634 Adding marker benchmark 2025-06-12 21:12:58 +00:00
Jake Poznanski
9787d007b9 Pulling in bigger benchmark script from vllm branch to main 2025-06-12 21:02:46 +00:00
Vik Paruchuri
267f52bd79 Update marker cost 2025-06-06 13:53:50 -04:00
Vik Paruchuri
f21ff08c2f Fix marker benchmarks 2025-06-04 23:10:14 -07:00
Jake Poznanski
8d92620d3c Merge remote-tracking branch 'origin/main' into retry_improvements 2025-05-29 20:33:45 +00:00
Jake Poznanski
cd5b524d20 Some benchmark cleanup 2025-05-29 20:32:25 +00:00
Jake Poznanski
022be37723 Some better info strings in benchmark runner 2025-05-29 18:43:27 +00:00
Jake Poznanski
01c4a561d3 Script fixes 2025-05-29 17:58:11 +00:00
Jake Poznanski
129412cdb0 Git lfs for more reliable downloads 2025-05-29 17:38:00 +00:00
Jake Poznanski
45e0ae59dc omg 2025-05-29 17:21:58 +00:00
Jake Poznanski
15e0064212 More fixes 2025-05-29 17:20:32 +00:00
Jake Poznanski
e8e6b6cb17 More fixes 2025-05-29 17:19:36 +00:00
Jake Poznanski
06988ac533 Image fixes 2025-05-29 17:18:12 +00:00
Jake Poznanski
ff31faebe4 Runner improvements 2025-05-29 17:12:41 +00:00
Jake Poznanski
475cc1c3a4 Working on runner script 2025-05-29 17:08:05 +00:00
Jake Poznanski
61d427ebf3 Repo cleanup 2025-05-28 17:08:25 +00:00
kyleclo
7a50ee1645 merge 2025-05-25 22:09:05 -07:00
kyleclo
241e5bfe70 Merge branch 'main' of github.com:allenai/olmocr 2025-05-25 22:08:46 -07:00
kyleclo
470394d1cf pareto plot 2025-05-25 22:07:02 -07:00
Jake Poznanski
d2755adf55 Bump version to v0.1.68 for release 2025-05-19 16:57:20 +00:00
Jake Poznanski
10b5e9e31e Includes 2025-05-16 21:30:09 +00:00
Jake Poznanski
63aee2c1e5 Code cleanup, version bump, remove unused permutation test 2025-05-16 21:25:32 +00:00
kyleclo
7f4edb240f pareto plot 2025-05-15 21:05:28 -07:00
Jake Poznanski
bb3fe14543 Pareto plot for paper 2025-05-15 23:57:18 +00:00
Jake Poznanski
d17210f40d Lint fix 2025-05-14 19:54:19 +00:00
Jake Poznanski
28966b9f14 Adding CDF plots 2025-05-14 16:57:56 +00:00
Jake Poznanski
2e8753af26 Docling runner based on CLI, but its too slow to use. Pii rule fixes 2025-05-14 16:31:56 +00:00
Jake Poznanski
74ef2b6f65 Fixes for some pii taggers 2025-05-13 16:19:50 +00:00
Jake Poznanski
e06fd622c3 Adjusting tagging pipelien v2 2025-05-10 17:43:56 +00:00
Jake Poznanski
623c66c85c Fixing up tagging pipeline 2025-05-10 17:41:43 +00:00
Jake Poznanski
1854ae1269 A bit more work on tagging 2025-05-09 19:31:07 +00:00
Jake Poznanski
72bcfd8f31 doing some extra pii tagging steps 2025-05-09 15:40:22 +00:00
Jake Poznanski
424052df63 Outputting some nice reference docs to check pii 2025-05-08 21:27:55 +00:00
Jake Poznanski
d18f3f734f More pii tag checking 2025-05-08 20:07:21 +00:00
Jake Poznanski
80645c886e Hypothesis checker 2025-05-08 17:58:50 +00:00
Jake Poznanski
3aba3a5c10 Comitting script to get stats on PII tagging 2025-05-08 17:02:36 +00:00
Jake Poznanski
9e5965a95e Some PII filter 2025-05-06 21:22:27 +00:00
Jake Poznanski
d671be6823 Working on some dataset filtering 2025-05-06 20:49:39 +00:00