Jake Poznanski
|
da21074477
|
More nits
|
2025-05-05 20:43:03 +00:00 |
|
Jake Poznanski
|
88270e9307
|
More work on qwen25 finetune
|
2025-05-05 20:39:28 +00:00 |
|
Jake Poznanski
|
a2ec95e0f5
|
Testing out to see where we stand on qwen2.5
|
2025-05-05 17:15:09 +00:00 |
|
Jake Poznanski
|
97e4992a3f
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-05-02 21:51:24 +00:00 |
|
Jake Poznanski
|
dcbe6543b8
|
Report for benchmarking
|
2025-05-02 21:51:23 +00:00 |
|
Jake Poznanski
|
18de822269
|
Update README.md
|
2025-05-01 13:31:19 -07:00 |
|
Jake Poznanski
|
791983c09b
|
Tweaking some more pii detection
|
2025-05-01 17:09:05 +00:00 |
|
Jake Poznanski
|
5cc084887a
|
Rich tagger with bigger model
|
2025-05-01 09:33:27 -07:00 |
|
Jake Poznanski
|
4ed00d097b
|
Fixes for rich tagging
|
2025-04-30 14:38:35 -07:00 |
|
Jake Poznanski
|
472ee108d7
|
Lints
|
2025-04-30 21:18:59 +00:00 |
|
Jake Poznanski
|
8ef7e56c86
|
Trying a new rich tagging pipeline for PII
|
2025-04-30 21:18:22 +00:00 |
|
Jake Poznanski
|
0a320e9870
|
Some helper scripts for Aman
|
2025-04-30 18:47:10 +00:00 |
|
Jake Poznanski
|
1067f80160
|
Update README.md
|
2025-04-29 15:43:43 -07:00 |
|
Jake Poznanski
|
4e9e13e56f
|
Option in benchmark to output tests which fail on all models for debugging
|
2025-04-29 14:07:07 -07:00 |
|
Jake Poznanski
|
e51362bcc2
|
Showing benchmark scores per category, speed improvements
|
2025-04-29 13:44:05 -07:00 |
|
Jake Poznanski
|
f8808478bd
|
Adding some small changes to the tagging pipeline
|
2025-04-29 11:12:03 -07:00 |
|
Jake Poznanski
|
66d293c178
|
Decent resume/cv tagging
|
2025-04-28 15:57:20 -07:00 |
|
Jake Poznanski
|
1f66b96ffd
|
Adding openai dependecy for benchmarking
|
2025-04-25 18:18:37 +00:00 |
|
Jake Poznanski
|
689bcd9e91
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-04-25 18:00:43 +00:00 |
|
Jake Poznanski
|
8ec7dbe2e0
|
Script updates
|
2025-04-25 18:00:41 +00:00 |
|
Aman Rangapur
|
a7db2bd160
|
Merge pull request #183 from allenai/amanr/bench_checkers
Added checker for old_scans_math
|
2025-04-24 17:20:48 -07:00 |
|
aman-17
|
c7220ce460
|
Merge remote-tracking branch 'origin/main' into amanr/bench_checkers
merge from main
|
2025-04-24 17:09:53 -07:00 |
|
Jake Poznanski
|
83002a0de7
|
Reinit credentials
|
2025-04-24 20:43:54 +00:00 |
|
Jake Poznanski
|
2d5e1838f4
|
Small corrections
|
2025-04-24 20:31:59 +00:00 |
|
Jake Poznanski
|
df71dc38ce
|
Small fix for cluster usage
|
2025-04-24 20:24:06 +00:00 |
|
Jake Poznanski
|
67a01cfcc8
|
FIxups for tagging pipeline
|
2025-04-24 20:14:42 +00:00 |
|
Jake Poznanski
|
c326fae03c
|
Refactoring tagging bigly
|
2025-04-24 10:18:30 -07:00 |
|
Jake Poznanski
|
811d267bd5
|
Merge branch 'main' of https://github.com/allenai/olmocr into main
|
2025-04-23 15:55:04 -07:00 |
|
Jake Poznanski
|
479b2c1b2d
|
Working on a tagger
|
2025-04-23 15:54:49 -07:00 |
|
Jake Poznanski
|
717ed811e1
|
Cleanup
|
2025-04-23 14:47:00 -07:00 |
|
Jake Poznanski
|
97ae48c66a
|
Making some more progress
|
2025-04-23 14:46:16 -07:00 |
|
aman-17
|
2a4522e7e5
|
fixed minor bug
|
2025-04-23 14:41:09 -07:00 |
|
aman-17
|
076f3e2e04
|
fixed style
|
2025-04-23 14:38:19 -07:00 |
|
aman-17
|
b095be0fed
|
added checker for old_scans_math
|
2025-04-23 14:37:42 -07:00 |
|
Aman Rangapur
|
85b40f46ce
|
Updated bench README.md
Cleaned old scans tests and removed [] and other symbols.
|
2025-04-23 13:53:24 -07:00 |
|
Jake Poznanski
|
7d8e9d181a
|
Fixing up tagging pipeline
|
2025-04-23 19:56:13 +00:00 |
|
Jake Poznanski
|
12100b420d
|
Adding some manual structure to be filled in
|
2025-04-23 18:39:31 +00:00 |
|
Jake Poznanski
|
ee8c506d92
|
Example of a basic empty pipeline that I'm hoping to extend for tagging
|
2025-04-23 18:27:26 +00:00 |
|
Jake Poznanski
|
582518f1e8
|
Merge pull request #181 from mhamada-ai2/patch-1
Update scan_dolmadocs.py
|
2025-04-23 09:48:08 -07:00 |
|
mhamada-ai2
|
01644c4a49
|
Update scan_dolmadocs.py
Instruction text updates and public release question update
|
2025-04-22 16:16:21 -07:00 |
|
Jake Poznanski
|
887efac133
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-04-22 21:33:53 +00:00 |
|
Jake Poznanski
|
246490f960
|
Lint fixes
|
2025-04-22 21:33:52 +00:00 |
|
Jake Poznanski
|
967210f23b
|
Adjustments to task
|
2025-04-22 21:33:39 +00:00 |
|
Jake Poznanski
|
3dffeeac22
|
Saving prolific PID
|
2025-04-22 21:16:41 +00:00 |
|
Aman Rangapur
|
622279850d
|
Merge pull request #179 from allenai/amanr/long_tiny_text
Added Miner for long tiny text
|
2025-04-22 14:00:26 -07:00 |
|
Jake Poznanski
|
b20a4886f9
|
README for benchmark
|
2025-04-22 20:35:11 +00:00 |
|
aman-17
|
0926dacc59
|
fixed style
|
2025-04-21 17:42:32 -07:00 |
|
aman-17
|
6845517761
|
added miner
|
2025-04-21 17:41:16 -07:00 |
|
Jake Poznanski
|
b897bf1414
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-04-18 15:47:32 +00:00 |
|
Jake Poznanski
|
f0992b95e1
|
Better staggering of downloads
|
2025-04-18 15:47:31 +00:00 |
|