Jake Poznanski
|
df71dc38ce
|
Small fix for cluster usage
|
2025-04-24 20:24:06 +00:00 |
|
Jake Poznanski
|
67a01cfcc8
|
FIxups for tagging pipeline
|
2025-04-24 20:14:42 +00:00 |
|
Jake Poznanski
|
c326fae03c
|
Refactoring tagging bigly
|
2025-04-24 10:18:30 -07:00 |
|
Jake Poznanski
|
811d267bd5
|
Merge branch 'main' of https://github.com/allenai/olmocr into main
|
2025-04-23 15:55:04 -07:00 |
|
Jake Poznanski
|
479b2c1b2d
|
Working on a tagger
|
2025-04-23 15:54:49 -07:00 |
|
Jake Poznanski
|
717ed811e1
|
Cleanup
|
2025-04-23 14:47:00 -07:00 |
|
Jake Poznanski
|
97ae48c66a
|
Making some more progress
|
2025-04-23 14:46:16 -07:00 |
|
aman-17
|
2a4522e7e5
|
fixed minor bug
|
2025-04-23 14:41:09 -07:00 |
|
aman-17
|
076f3e2e04
|
fixed style
|
2025-04-23 14:38:19 -07:00 |
|
aman-17
|
b095be0fed
|
added checker for old_scans_math
|
2025-04-23 14:37:42 -07:00 |
|
Aman Rangapur
|
85b40f46ce
|
Updated bench README.md
Cleaned old scans tests and removed [] and other symbols.
|
2025-04-23 13:53:24 -07:00 |
|
Jake Poznanski
|
7d8e9d181a
|
Fixing up tagging pipeline
|
2025-04-23 19:56:13 +00:00 |
|
Jake Poznanski
|
12100b420d
|
Adding some manual structure to be filled in
|
2025-04-23 18:39:31 +00:00 |
|
Jake Poznanski
|
ee8c506d92
|
Example of a basic empty pipeline that I'm hoping to extend for tagging
|
2025-04-23 18:27:26 +00:00 |
|
Jake Poznanski
|
582518f1e8
|
Merge pull request #181 from mhamada-ai2/patch-1
Update scan_dolmadocs.py
|
2025-04-23 09:48:08 -07:00 |
|
mhamada-ai2
|
01644c4a49
|
Update scan_dolmadocs.py
Instruction text updates and public release question update
|
2025-04-22 16:16:21 -07:00 |
|
Jake Poznanski
|
887efac133
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-04-22 21:33:53 +00:00 |
|
Jake Poznanski
|
246490f960
|
Lint fixes
|
2025-04-22 21:33:52 +00:00 |
|
Jake Poznanski
|
967210f23b
|
Adjustments to task
|
2025-04-22 21:33:39 +00:00 |
|
Jake Poznanski
|
3dffeeac22
|
Saving prolific PID
|
2025-04-22 21:16:41 +00:00 |
|
Aman Rangapur
|
622279850d
|
Merge pull request #179 from allenai/amanr/long_tiny_text
Added Miner for long tiny text
|
2025-04-22 14:00:26 -07:00 |
|
Jake Poznanski
|
b20a4886f9
|
README for benchmark
|
2025-04-22 20:35:11 +00:00 |
|
aman-17
|
0926dacc59
|
fixed style
|
2025-04-21 17:42:32 -07:00 |
|
aman-17
|
6845517761
|
added miner
|
2025-04-21 17:41:16 -07:00 |
|
Jake Poznanski
|
b897bf1414
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-04-18 15:47:32 +00:00 |
|
Jake Poznanski
|
f0992b95e1
|
Better staggering of downloads
|
2025-04-18 15:47:31 +00:00 |
|
Jake Poznanski
|
dd92c75c1f
|
Fixing CI again
|
2025-04-17 14:43:46 -07:00 |
|
Jake Poznanski
|
cd79b202ed
|
Fixing gh actions
|
2025-04-17 14:32:43 -07:00 |
|
Jake Poznanski
|
8f46b6e966
|
Running more tests in CI
|
2025-04-17 14:26:06 -07:00 |
|
Jake Poznanski
|
6fefc98f77
|
Merge branch 'main' of https://github.com/allenai/olmocr into main
|
2025-04-17 13:51:51 -07:00 |
|
Jake Poznanski
|
5aa6a9f1a3
|
Fixing olmocr_pipeline in converter
|
2025-04-17 13:51:49 -07:00 |
|
Jake Poznanski
|
858cf69507
|
Bumping version
|
2025-04-17 17:00:01 +00:00 |
|
Jake Poznanski
|
10cb6aad26
|
Updating pipeline to take cloud storage model names and paths, as well as local directory
|
2025-04-17 09:59:28 -07:00 |
|
Jake Poznanski
|
e3617130ae
|
Update README.md
|
2025-04-16 18:46:29 -07:00 |
|
Jake Poznanski
|
ac8c5369c9
|
Update README.md
|
2025-04-16 18:43:32 -07:00 |
|
Jake Poznanski
|
df657575b6
|
Update README.md
|
2025-04-16 17:02:32 -07:00 |
|
Jake Poznanski
|
ca6e1427c1
|
Adding some extra unit tests on some math cases I wasn't sure of
|
2025-04-16 23:44:48 +00:00 |
|
Jake Poznanski
|
7a638c74c9
|
Adding some more options to prompt chatgpt
|
2025-04-16 22:47:28 +00:00 |
|
Jake Poznanski
|
eabbe279fb
|
Lint fixes
|
2025-04-16 20:14:20 +00:00 |
|
Jake Poznanski
|
7f822607c0
|
Merge pull request #173 from allenai/amanr/olmocr-bench-old_scans
Added files of old_scans and old_scans_math for bench
|
2025-04-16 13:01:02 -07:00 |
|
Jake Poznanski
|
e16f66d6c5
|
Working on annotation for dolma docs release
|
2025-04-16 19:29:45 +00:00 |
|
aman-17
|
b85b71c2a5
|
removed old_scans
|
2025-04-15 15:44:26 -07:00 |
|
Jake Poznanski
|
9a67f50539
|
Doing some work on annotations again...
|
2025-04-15 22:27:07 +00:00 |
|
aman-17
|
2622a09a45
|
renamed processing_old_scans to mine_old_scans
|
2025-04-15 15:20:42 -07:00 |
|
aman-17
|
48de825e3a
|
added old_scans_math miner
|
2025-04-15 15:18:31 -07:00 |
|
aman-17
|
c72b8fb47c
|
fixed style and lint
|
2025-04-15 15:14:00 -07:00 |
|
aman-17
|
bc89f90216
|
removed convert file
|
2025-04-15 15:12:35 -07:00 |
|
aman-17
|
8abc475a0b
|
added old_scans and old_scans math miners and review app
|
2025-04-15 15:11:20 -07:00 |
|
Jake Poznanski
|
1d0c560455
|
Upping version to fix issue with work queue and delimited paths
|
2025-04-15 18:50:13 +00:00 |
|
aman-17
|
7703f0c9fa
|
update
|
2025-04-14 19:40:17 -07:00 |
|