169 Commits

Author SHA1 Message Date
Jake Poznanski
c326fae03c Refactoring tagging bigly 2025-04-24 10:18:30 -07:00
Jake Poznanski
479b2c1b2d Working on a tagger 2025-04-23 15:54:49 -07:00
Jake Poznanski
717ed811e1 Cleanup 2025-04-23 14:47:00 -07:00
Jake Poznanski
97ae48c66a Making some more progress 2025-04-23 14:46:16 -07:00
Jake Poznanski
7d8e9d181a Fixing up tagging pipeline 2025-04-23 19:56:13 +00:00
Jake Poznanski
12100b420d Adding some manual structure to be filled in 2025-04-23 18:39:31 +00:00
Jake Poznanski
ee8c506d92 Example of a basic empty pipeline that I'm hoping to extend for tagging 2025-04-23 18:27:26 +00:00
mhamada-ai2
01644c4a49
Update scan_dolmadocs.py
Instruction text updates and public release question update
2025-04-22 16:16:21 -07:00
Jake Poznanski
246490f960 Lint fixes 2025-04-22 21:33:52 +00:00
Jake Poznanski
967210f23b Adjustments to task 2025-04-22 21:33:39 +00:00
Jake Poznanski
3dffeeac22 Saving prolific PID 2025-04-22 21:16:41 +00:00
Jake Poznanski
eabbe279fb Lint fixes 2025-04-16 20:14:20 +00:00
Jake Poznanski
e16f66d6c5 Working on annotation for dolma docs release 2025-04-16 19:29:45 +00:00
Jake Poznanski
9a67f50539 Doing some work on annotations again... 2025-04-15 22:27:07 +00:00
Jake Poznanski
1d0c560455 Upping version to fix issue with work queue and delimited paths 2025-04-15 18:50:13 +00:00
Jake Poznanski
786b14aef5 Final adjustments 2025-04-14 23:27:27 +00:00
Jake Poznanski
4d8a8affdb Adjusting prolific script 2025-04-14 23:21:28 +00:00
Jake Poznanski
dc2512c2f0 Adjusted annotation script 2025-04-14 20:27:06 +00:00
Jake Poznanski
ee41449ff6 Instructions updated in annotation tool 2025-04-14 19:07:13 +00:00
Jake Poznanski
590a92ec2f Ruff fix 2025-04-10 21:50:14 +00:00
Jake Poznanski
a13a50143a Formatting, fixes to annotation tool 2025-04-08 22:30:59 +00:00
Jake Poznanski
a74800f528 New flowchart based annotation tool 2025-04-08 21:04:56 +00:00
Jake Poznanski
cdc7fae4f9 Adjusting annotation script 2025-04-08 20:50:00 +00:00
Jake Poznanski
474e0ef6ed Lint fixes, adjusting qwen2.5 vl prompt 2025-04-07 21:19:36 -07:00
Jake Poznanski
f0d18e8b80 Final version for prolific 2025-04-07 21:39:55 +00:00
Jake Poznanski
b626b4a1e1 Adjusting labeling task 2025-04-07 20:27:32 +00:00
Jake Poznanski
3d1925067b Removing progress bar in annotation UI 2025-04-04 21:41:36 +00:00
Jake Poznanski
caf21b9664 Lints 2025-04-04 19:45:38 +00:00
Jake Poznanski
f1188dc85d Merge branch 'main' of https://github.com/allenai/olmocr 2025-04-04 19:44:55 +00:00
Jake Poznanski
a0f8b028f8 Reporting results 2025-04-04 19:44:54 +00:00
Jake Poznanski
cc7b1131c6 Editing 2025-04-04 19:38:59 +00:00
Jake Poznanski
9338f5359f Saving pdf paths 2025-04-04 19:36:10 +00:00
Jake Poznanski
c8cc61b95f
Merge pull request #163 from franzbischoff/main
Add script to convert JSONL files to Markdown format
2025-04-04 12:30:54 -07:00
Jake Poznanski
61624a37ff Fixed 2025-04-04 17:53:26 +00:00
Jake Poznanski
d299119c65 Links updated 2025-04-04 17:18:41 +00:00
Jake Poznanski
a113fd3015 Review app 2025-04-04 17:18:19 +00:00
Jake Poznanski
e8c14fc496 Saving prolific codes 2025-04-04 17:12:46 +00:00
Jake Poznanski
cd9e370c92 Tinyhosting automatically 2025-04-04 16:29:58 +00:00
Jake Poznanski
02cd002488 Step by step annotation 2025-04-04 16:19:04 +00:00
Jake Poznanski
6a0dbfc925 Adjusting buttons 2025-04-04 16:05:04 +00:00
Francisco Bischoff
c2193ddc93
Remove first line 2025-04-04 16:44:21 +01:00
Francisco Bischoff
c96143c3b1
Add script to convert JSONL files to Markdown format 2025-04-04 12:52:58 +01:00
Jake Poznanski
83ae61014c Scan dolma docs improvements for PII review 2025-04-01 20:03:15 +00:00
Jake Poznanski
bc78e0d8a0 Adding feedback 2025-04-01 18:35:04 +00:00
Jake Poznanski
213252f048 A few improvements to the dolma doc viewer script 2025-04-01 18:25:40 +00:00
Jake Poznanski
d45c0323a4 Better equation rendering checker with more tests. 2025-03-26 18:49:48 +00:00
Jake Poznanski
b8e3034847 Trying a change to the render script 2025-03-26 18:26:06 +00:00
Jake Poznanski
f5d92bdb14 Trying to get new CI to work 2025-03-14 02:43:55 +00:00
Chris Wilhelm
9b958e65f1 moves what happens where around a bit and updates readme 2025-03-13 15:31:55 -07:00
Chris Wilhelm
098b01c006 wire it up into a gh action 2025-03-13 15:31:55 -07:00