1072 Commits

Author SHA1 Message Date
aman-17
90342efc7e docker fixed version tag for dev run 2025-05-22 16:15:42 -07:00
aman-17
21e5ab5dd0 fixing tag 2025-05-22 15:26:39 -07:00
aman-17
beb8816677 fixed tags 2025-05-22 15:24:23 -07:00
aman-17
cdc3ce9cdd updated docker_build 2025-05-22 15:18:18 -07:00
aman-17
70c40e48ab removed unneccesary files before building 2025-05-22 15:06:18 -07:00
aman-17
8bfd1f144b added workflow and fixed readme 2025-05-22 14:28:40 -07:00
aman-17
8a8f8dc211 added docker hub repo link. 2025-05-21 11:00:52 -07:00
aman-17
07b639726d added Dockerfile and its instructions 2025-05-21 10:57:04 -07:00
Jake Poznanski
74f4786efb README updated with pip install and --markdown 2025-05-20 19:51:29 +00:00
Jake Poznanski
57238cf20b Bump version to v0.1.69 for release v0.1.69 2025-05-20 16:45:25 +00:00
Jake Poznanski
71275cce76 Bumping version, adding more docs, more to come 2025-05-20 16:42:21 +00:00
Jake Poznanski
7b640ae113 Merge branch 'main' of https://github.com/allenai/olmocr 2025-05-19 19:42:50 +00:00
Jake Poznanski
8d8e32331a Adding markdown flag to directly generate markdown outputs 2025-05-19 19:42:48 +00:00
Jake Poznanski
0ac6c9a3a6
Update README.md 2025-05-19 11:10:09 -07:00
Jake Poznanski
1043491565 Oops, removing submodule olmOCR bench repo, best if you just clone from hugging face 2025-05-19 18:09:14 +00:00
Jake Poznanski
2c1c8a693b Updating readme more 2025-05-19 17:30:21 +00:00
Jake Poznanski
d2755adf55 Bump version to v0.1.68 for release v0.1.68 2025-05-19 16:57:20 +00:00
Jake Poznanski
db9972c39a Readme updates 2025-05-19 16:56:22 +00:00
Jake Poznanski
c97ce8bcd4 Lints 2025-05-16 22:40:54 +00:00
Jake Poznanski
08806fdec6 Fixups 2025-05-16 21:32:24 +00:00
Jake Poznanski
10b5e9e31e Includes 2025-05-16 21:30:09 +00:00
Jake Poznanski
63aee2c1e5 Code cleanup, version bump, remove unused permutation test 2025-05-16 21:25:32 +00:00
Jake Poznanski
5de52e7d13
Update README.md 2025-05-16 14:20:21 -07:00
kyleclo
66f9b46869 Merge branch 'main' of github.com:allenai/olmocr 2025-05-15 21:06:10 -07:00
kyleclo
7f4edb240f pareto plot 2025-05-15 21:05:28 -07:00
Jake Poznanski
0da6fa0c59
Update README.md 2025-05-15 20:41:37 -07:00
Jake Poznanski
c970851dab Merge branch 'main' of https://github.com/allenai/olmocr 2025-05-15 23:57:20 +00:00
Jake Poznanski
bb3fe14543 Pareto plot for paper 2025-05-15 23:57:18 +00:00
Jake Poznanski
4b4ba454ba
Update README.md 2025-05-15 16:17:29 -07:00
Jake Poznanski
f0768bba3e Merge branch 'main' of https://github.com/allenai/olmocr 2025-05-15 22:50:30 +00:00
Jake Poznanski
c4a0fb9af5 Adding back in proper CI estimation 2025-05-15 22:50:29 +00:00
Aman Rangapur
d047bc6712
Updated README.md 2025-05-15 11:34:07 -07:00
Jake Poznanski
d17210f40d Lint fix 2025-05-14 19:54:19 +00:00
Jake Poznanski
ffee4c9740 Big bug fix, moving the prompt to match how training was done, 2.3 point boost on olmocr-bench 2025-05-14 19:51:00 +00:00
Jake Poznanski
28966b9f14 Adding CDF plots 2025-05-14 16:57:56 +00:00
Jake Poznanski
2e8753af26 Docling runner based on CLI, but its too slow to use. Pii rule fixes 2025-05-14 16:31:56 +00:00
Jake Poznanski
74ef2b6f65 Fixes for some pii taggers 2025-05-13 16:19:50 +00:00
Jake Poznanski
b3b405d077 dedupe script 2025-05-12 17:02:35 +00:00
Jake Poznanski
e06fd622c3 Adjusting tagging pipelien v2 2025-05-10 17:43:56 +00:00
Jake Poznanski
1538163f6f Merge branch 'main' of https://github.com/allenai/olmocr 2025-05-10 17:41:44 +00:00
Jake Poznanski
623c66c85c Fixing up tagging pipeline 2025-05-10 17:41:43 +00:00
Jake Poznanski
1c59130b55
Update README.md 2025-05-09 14:51:18 -07:00
Jake Poznanski
225b705eef
Update README.md 2025-05-09 14:48:49 -07:00
Jake Poznanski
1854ae1269 A bit more work on tagging 2025-05-09 19:31:07 +00:00
Jake Poznanski
72bcfd8f31 doing some extra pii tagging steps 2025-05-09 15:40:22 +00:00
Jake Poznanski
9871e066b4 Merge branch 'main' of https://github.com/allenai/olmocr 2025-05-08 21:27:56 +00:00
Jake Poznanski
424052df63 Outputting some nice reference docs to check pii 2025-05-08 21:27:55 +00:00
Jake Poznanski
d18f3f734f More pii tag checking 2025-05-08 20:07:21 +00:00
Jake Poznanski
80645c886e Hypothesis checker 2025-05-08 17:58:50 +00:00
Jake Poznanski
03db04cb7e Fixing handling of new lines in some test cases 2025-05-08 17:21:06 +00:00