387 Commits

Author SHA1 Message Date
Jake Poznanski
587b73f23e Try with more aggressive anchor changing 2025-05-29 22:33:16 +00:00
Jake Poznanski
8f5d5bdf28 Revert "Trying to add repetition penalty"
This reverts commit 90f754e7b182f5978f60f5e4734f6ebb0aa3e735.
2025-05-29 21:59:23 +00:00
Jake Poznanski
90f754e7b1 Trying to add repetition penalty 2025-05-29 21:27:13 +00:00
Jake Poznanski
9dcdef6ca3 Going to try with up to 5k tokens 2025-05-29 20:34:05 +00:00
Jake Poznanski
8d92620d3c Merge remote-tracking branch 'origin/main' into retry_improvements 2025-05-29 20:33:45 +00:00
Jake Poznanski
cd5b524d20 Some benchmark cleanup 2025-05-29 20:32:25 +00:00
Jake Poznanski
2cb14cceae ALlowing more tokens 2025-05-29 19:59:58 +00:00
Jake Poznanski
22ee068d88 Merge remote-tracking branch 'origin/main' into retry_improvements 2025-05-29 18:25:10 +00:00
Jake Poznanski
fbcd82ad30 Cleanup attempt lookup code a bit 2025-05-29 16:01:26 +00:00
aman-17
ce616c6514 addressed Jake's comments 2025-05-28 19:01:01 -07:00
aman-17
8a63093663 fixed lint 2025-05-28 14:45:07 -07:00
aman-17
cd5db7f281 fixed style and lint 2025-05-28 14:42:07 -07:00
aman-17
acc2687f21 Updated dockerfile and added a file 2025-05-28 14:35:23 -07:00
Jake Poznanski
f8fd234093 Idea to improve retry performance 2025-05-28 18:27:40 +00:00
Jake Poznanski
76270f5538 Upping to v70 to test new docker builds 2025-05-23 20:09:45 +00:00
Jake Poznanski
bea1873300
Update README.md 2025-05-23 11:32:02 -07:00
Jake Poznanski
7996a7dac4
Update README.md 2025-05-22 16:00:29 -07:00
Jake Poznanski
71275cce76 Bumping version, adding more docs, more to come 2025-05-20 16:42:21 +00:00
Jake Poznanski
8d8e32331a Adding markdown flag to directly generate markdown outputs 2025-05-19 19:42:48 +00:00
Jake Poznanski
2c1c8a693b Updating readme more 2025-05-19 17:30:21 +00:00
Jake Poznanski
db9972c39a Readme updates 2025-05-19 16:56:22 +00:00
Jake Poznanski
c97ce8bcd4 Lints 2025-05-16 22:40:54 +00:00
Jake Poznanski
08806fdec6 Fixups 2025-05-16 21:32:24 +00:00
Jake Poznanski
10b5e9e31e Includes 2025-05-16 21:30:09 +00:00
Jake Poznanski
63aee2c1e5 Code cleanup, version bump, remove unused permutation test 2025-05-16 21:25:32 +00:00
Jake Poznanski
5de52e7d13
Update README.md 2025-05-16 14:20:21 -07:00
Jake Poznanski
0da6fa0c59
Update README.md 2025-05-15 20:41:37 -07:00
Jake Poznanski
f0768bba3e Merge branch 'main' of https://github.com/allenai/olmocr 2025-05-15 22:50:30 +00:00
Jake Poznanski
c4a0fb9af5 Adding back in proper CI estimation 2025-05-15 22:50:29 +00:00
Aman Rangapur
d047bc6712
Updated README.md 2025-05-15 11:34:07 -07:00
Jake Poznanski
ffee4c9740 Big bug fix, moving the prompt to match how training was done, 2.3 point boost on olmocr-bench 2025-05-14 19:51:00 +00:00
Jake Poznanski
2e8753af26 Docling runner based on CLI, but its too slow to use. Pii rule fixes 2025-05-14 16:31:56 +00:00
Jake Poznanski
74ef2b6f65 Fixes for some pii taggers 2025-05-13 16:19:50 +00:00
Jake Poznanski
b3b405d077 dedupe script 2025-05-12 17:02:35 +00:00
Jake Poznanski
1538163f6f Merge branch 'main' of https://github.com/allenai/olmocr 2025-05-10 17:41:44 +00:00
Jake Poznanski
623c66c85c Fixing up tagging pipeline 2025-05-10 17:41:43 +00:00
Jake Poznanski
1c59130b55
Update README.md 2025-05-09 14:51:18 -07:00
Jake Poznanski
225b705eef
Update README.md 2025-05-09 14:48:49 -07:00
Jake Poznanski
03db04cb7e Fixing handling of new lines in some test cases 2025-05-08 17:21:06 +00:00
Aman Rangapur
6f62e05b1f
Merge pull request #188 from allenai/amanr/miners
added checker for `hea_foo` and miner to get `old_scans` img's
2025-05-07 11:41:29 -07:00
Jake Poznanski
ef083bf845 Stats fix 2025-05-06 21:21:06 +00:00
Jake Poznanski
a2ec95e0f5 Testing out to see where we stand on qwen2.5 2025-05-05 17:15:09 +00:00
aman-17
57720564ee fixed lint and style 2025-05-02 16:24:03 -07:00
aman-17
281ca51916 added checker for hea_foo and miner to get old scans img's 2025-05-02 16:22:45 -07:00
Jake Poznanski
97e4992a3f Merge branch 'main' of https://github.com/allenai/olmocr 2025-05-02 21:51:24 +00:00
Jake Poznanski
dcbe6543b8 Report for benchmarking 2025-05-02 21:51:23 +00:00
Jake Poznanski
18de822269
Update README.md 2025-05-01 13:31:19 -07:00
Jake Poznanski
472ee108d7 Lints 2025-04-30 21:18:59 +00:00
Jake Poznanski
0a320e9870 Some helper scripts for Aman 2025-04-30 18:47:10 +00:00
Jake Poznanski
1067f80160
Update README.md 2025-04-29 15:43:43 -07:00