Jake Poznanski
|
6fefc98f77
|
Merge branch 'main' of https://github.com/allenai/olmocr into main
|
2025-04-17 13:51:51 -07:00 |
|
Jake Poznanski
|
5aa6a9f1a3
|
Fixing olmocr_pipeline in converter
|
2025-04-17 13:51:49 -07:00 |
|
Jake Poznanski
|
858cf69507
|
Bumping version
|
2025-04-17 17:00:01 +00:00 |
|
Jake Poznanski
|
10cb6aad26
|
Updating pipeline to take cloud storage model names and paths, as well as local directory
|
2025-04-17 09:59:28 -07:00 |
|
Jake Poznanski
|
e3617130ae
|
Update README.md
|
2025-04-16 18:46:29 -07:00 |
|
Jake Poznanski
|
ac8c5369c9
|
Update README.md
|
2025-04-16 18:43:32 -07:00 |
|
Jake Poznanski
|
df657575b6
|
Update README.md
|
2025-04-16 17:02:32 -07:00 |
|
Jake Poznanski
|
ca6e1427c1
|
Adding some extra unit tests on some math cases I wasn't sure of
|
2025-04-16 23:44:48 +00:00 |
|
Jake Poznanski
|
7a638c74c9
|
Adding some more options to prompt chatgpt
|
2025-04-16 22:47:28 +00:00 |
|
Jake Poznanski
|
eabbe279fb
|
Lint fixes
|
2025-04-16 20:14:20 +00:00 |
|
Jake Poznanski
|
7f822607c0
|
Merge pull request #173 from allenai/amanr/olmocr-bench-old_scans
Added files of old_scans and old_scans_math for bench
|
2025-04-16 13:01:02 -07:00 |
|
Jake Poznanski
|
e16f66d6c5
|
Working on annotation for dolma docs release
|
2025-04-16 19:29:45 +00:00 |
|
aman-17
|
b85b71c2a5
|
removed old_scans
|
2025-04-15 15:44:26 -07:00 |
|
Jake Poznanski
|
9a67f50539
|
Doing some work on annotations again...
|
2025-04-15 22:27:07 +00:00 |
|
aman-17
|
2622a09a45
|
renamed processing_old_scans to mine_old_scans
|
2025-04-15 15:20:42 -07:00 |
|
aman-17
|
48de825e3a
|
added old_scans_math miner
|
2025-04-15 15:18:31 -07:00 |
|
aman-17
|
c72b8fb47c
|
fixed style and lint
|
2025-04-15 15:14:00 -07:00 |
|
aman-17
|
bc89f90216
|
removed convert file
|
2025-04-15 15:12:35 -07:00 |
|
aman-17
|
8abc475a0b
|
added old_scans and old_scans math miners and review app
|
2025-04-15 15:11:20 -07:00 |
|
Jake Poznanski
|
1d0c560455
|
Upping version to fix issue with work queue and delimited paths
|
2025-04-15 18:50:13 +00:00 |
|
aman-17
|
7703f0c9fa
|
update
|
2025-04-14 19:40:17 -07:00 |
|
aman-17
|
7c1c43649a
|
added old latex
|
2025-04-14 18:53:20 -07:00 |
|
Jake Poznanski
|
786b14aef5
|
Final adjustments
|
2025-04-14 23:27:27 +00:00 |
|
Jake Poznanski
|
4d8a8affdb
|
Adjusting prolific script
|
2025-04-14 23:21:28 +00:00 |
|
Jake Poznanski
|
dc2512c2f0
|
Adjusted annotation script
|
2025-04-14 20:27:06 +00:00 |
|
aman-17
|
3a1f98ca65
|
added more testcases for old_docs
|
2025-04-14 13:25:20 -07:00 |
|
Jake Poznanski
|
ee41449ff6
|
Instructions updated in annotation tool
|
2025-04-14 19:07:13 +00:00 |
|
Jake Poznanski
|
5ebec4664a
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-04-14 17:14:53 +00:00 |
|
Jake Poznanski
|
0b5cd40664
|
Staggering model downloads in big sharded jobs
|
2025-04-14 17:14:51 +00:00 |
|
Jake Poznanski
|
f7529f4e60
|
Update README.md
|
2025-04-11 11:23:17 -07:00 |
|
Jake Poznanski
|
b3c3a13e03
|
Update README.md
|
2025-04-10 16:06:17 -07:00 |
|
Jake Poznanski
|
52e11d3f38
|
Update README.md
|
2025-04-10 16:03:02 -07:00 |
|
Jake Poznanski
|
7b53714e27
|
Update README.md
|
2025-04-10 16:02:02 -07:00 |
|
Jake Poznanski
|
d781121e44
|
Update README.md
|
2025-04-10 16:00:05 -07:00 |
|
Jake Poznanski
|
3f34969a85
|
Rendering math in review app
|
2025-04-10 21:58:32 +00:00 |
|
Jake Poznanski
|
590a92ec2f
|
Ruff fix
|
2025-04-10 21:50:14 +00:00 |
|
aman-17
|
c7d0510fc1
|
removed rejected instances and cleaned up
|
2025-04-08 16:28:40 -07:00 |
|
aman-17
|
4e2a534f84
|
updated old_docs
|
2025-04-08 16:02:51 -07:00 |
|
Jake Poznanski
|
4e990e2584
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-04-08 22:31:01 +00:00 |
|
Jake Poznanski
|
a13a50143a
|
Formatting, fixes to annotation tool
|
2025-04-08 22:30:59 +00:00 |
|
Jake Poznanski
|
c7ddad0cc0
|
Decent prompt
|
2025-04-08 14:55:12 -07:00 |
|
Jake Poznanski
|
cf0d07d8d7
|
Merge branch 'main' of https://github.com/allenai/olmocr into main
|
2025-04-08 14:09:30 -07:00 |
|
Jake Poznanski
|
df6a96d90d
|
Prompting improving
|
2025-04-08 14:09:28 -07:00 |
|
Jake Poznanski
|
a74800f528
|
New flowchart based annotation tool
|
2025-04-08 21:04:56 +00:00 |
|
Jake Poznanski
|
cdc7fae4f9
|
Adjusting annotation script
|
2025-04-08 20:50:00 +00:00 |
|
Jake Poznanski
|
2f74a2a996
|
Prompt6 for qwen2.7 vl
|
2025-04-08 13:25:15 -07:00 |
|
aman-17
|
3e7d4b17ec
|
update
|
2025-04-08 13:21:34 -07:00 |
|
aman-17
|
92e168a91e
|
added old docs
|
2025-04-08 11:38:19 -07:00 |
|
Jake Poznanski
|
8c287a0255
|
Basic prompt edits
|
2025-04-08 10:28:41 -07:00 |
|
Jake Poznanski
|
ecbd3a246f
|
Merge branch 'main' of https://github.com/allenai/olmocr into main
|
2025-04-07 21:19:37 -07:00 |
|