Jake Poznanski
|
b97e90ce3a
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-04-07 20:27:35 +00:00 |
|
Jake Poznanski
|
b626b4a1e1
|
Adjusting labeling task
|
2025-04-07 20:27:32 +00:00 |
|
Jake Poznanski
|
b607aecbbc
|
Lints
|
2025-04-07 10:21:35 -07:00 |
|
Jake Poznanski
|
95b03a1df0
|
Fixing gemini conver script to use new API
|
2025-04-07 10:20:58 -07:00 |
|
Jake Poznanski
|
3d1925067b
|
Removing progress bar in annotation UI
|
2025-04-04 21:41:36 +00:00 |
|
Jake Poznanski
|
caf21b9664
|
Lints
|
2025-04-04 19:45:38 +00:00 |
|
Jake Poznanski
|
f1188dc85d
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-04-04 19:44:55 +00:00 |
|
Jake Poznanski
|
a0f8b028f8
|
Reporting results
|
2025-04-04 19:44:54 +00:00 |
|
Jake Poznanski
|
cc7b1131c6
|
Editing
|
2025-04-04 19:38:59 +00:00 |
|
Jake Poznanski
|
9338f5359f
|
Saving pdf paths
|
2025-04-04 19:36:10 +00:00 |
|
Jake Poznanski
|
ee70b68a19
|
Merge pull request #164 from allenai/amanr/multi_columns
Added multi_column miner script
|
2025-04-04 12:33:25 -07:00 |
|
Jake Poznanski
|
c8cc61b95f
|
Merge pull request #163 from franzbischoff/main
Add script to convert JSONL files to Markdown format
|
2025-04-04 12:30:54 -07:00 |
|
aman-17
|
71e44a1b4e
|
fixed style
|
2025-04-04 11:10:00 -07:00 |
|
aman-17
|
9fd7bc8a96
|
added multi_column script
|
2025-04-04 11:01:59 -07:00 |
|
Jake Poznanski
|
61624a37ff
|
Fixed
|
2025-04-04 17:53:26 +00:00 |
|
Jake Poznanski
|
d299119c65
|
Links updated
|
2025-04-04 17:18:41 +00:00 |
|
Jake Poznanski
|
a113fd3015
|
Review app
|
2025-04-04 17:18:19 +00:00 |
|
Jake Poznanski
|
e8c14fc496
|
Saving prolific codes
|
2025-04-04 17:12:46 +00:00 |
|
Jake Poznanski
|
cd9e370c92
|
Tinyhosting automatically
|
2025-04-04 16:29:58 +00:00 |
|
Jake Poznanski
|
02cd002488
|
Step by step annotation
|
2025-04-04 16:19:04 +00:00 |
|
Jake Poznanski
|
6a0dbfc925
|
Adjusting buttons
|
2025-04-04 16:05:04 +00:00 |
|
Francisco Bischoff
|
c2193ddc93
|
Remove first line
|
2025-04-04 16:44:21 +01:00 |
|
Francisco Bischoff
|
c96143c3b1
|
Add script to convert JSONL files to Markdown format
|
2025-04-04 12:52:58 +01:00 |
|
Jake Poznanski
|
d4d87f7c65
|
Force flag for review app, tests fixed for difference comparison in tables
|
2025-04-03 20:27:01 +00:00 |
|
Jake Poznanski
|
e856e9de1d
|
Test mining not including line numbers
|
2025-04-02 23:07:32 +00:00 |
|
Jake Poznanski
|
2614fc9050
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-04-02 21:46:35 +00:00 |
|
Jake Poznanski
|
a96f1541c4
|
Hopefuly avoiding comparison issues now
|
2025-04-02 21:46:34 +00:00 |
|
Jake Poznanski
|
46ca990663
|
Merge branch 'main' of https://github.com/allenai/olmocr into main
|
2025-04-02 14:46:13 -07:00 |
|
Jake Poznanski
|
0d94d15341
|
Test validation
|
2025-04-02 14:46:07 -07:00 |
|
Jake Poznanski
|
b8b780faca
|
More mining of synthetic tests code
|
2025-04-02 21:39:50 +00:00 |
|
Jake Poznanski
|
360b1be07c
|
Better filtering of tests
|
2025-04-02 21:24:00 +00:00 |
|
Jake Poznanski
|
6d3a7d634e
|
Adding autorender if katex into synthetic pipeline
|
2025-04-02 21:14:14 +00:00 |
|
Jake Poznanski
|
4604b59661
|
SYnth mining
|
2025-04-02 20:25:16 +00:00 |
|
Jake Poznanski
|
69b0222697
|
Improving miner script
|
2025-04-02 20:12:06 +00:00 |
|
Jake Poznanski
|
841ce72c19
|
Miner improvements
|
2025-04-02 18:49:43 +00:00 |
|
Jake Poznanski
|
97376493fd
|
More tests
|
2025-04-02 18:39:51 +00:00 |
|
Jake Poznanski
|
748ab95751
|
Miner unit tests for duplicate absent tests
|
2025-04-02 18:12:05 +00:00 |
|
Jake Poznanski
|
594f47306b
|
Synth miner coming together more
|
2025-04-02 18:02:39 +00:00 |
|
Jake Poznanski
|
fb8b23d506
|
SMall adjustments to synthetic data pipeline
|
2025-04-02 17:46:48 +00:00 |
|
Jake Poznanski
|
678c000685
|
Nicer claude prompt for synth data gen
|
2025-04-01 22:42:09 +00:00 |
|
Jake Poznanski
|
5c98a47eaa
|
Mining upgrades
|
2025-04-01 22:22:19 +00:00 |
|
Jake Poznanski
|
a34b158ebf
|
Lints
|
2025-04-01 20:05:55 +00:00 |
|
Jake Poznanski
|
83ae61014c
|
Scan dolma docs improvements for PII review
|
2025-04-01 20:03:15 +00:00 |
|
Jake Poznanski
|
bc78e0d8a0
|
Adding feedback
|
2025-04-01 18:35:04 +00:00 |
|
Jake Poznanski
|
213252f048
|
A few improvements to the dolma doc viewer script
|
2025-04-01 18:25:40 +00:00 |
|
Jake Poznanski
|
3ca39abd9b
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-04-01 18:11:09 +00:00 |
|
Jake Poznanski
|
7e46626452
|
Update README.md
|
2025-03-31 13:50:07 -07:00 |
|
Jake Poznanski
|
0d21ade0d8
|
Unused import
|
2025-03-31 13:30:20 -07:00 |
|
Jake Poznanski
|
b64fd19db3
|
Cleaning up code for image to pdf conversion
|
2025-03-31 13:28:30 -07:00 |
|
Jake Poznanski
|
cc8e4b1863
|
Adding native support to convert pngs and jpgs to pdfs so the pipeline can work on them
|
2025-03-31 10:59:38 -07:00 |
|