Jake Poznanski
|
791983c09b
|
Tweaking some more pii detection
|
2025-05-01 17:09:05 +00:00 |
|
Jake Poznanski
|
5cc084887a
|
Rich tagger with bigger model
|
2025-05-01 09:33:27 -07:00 |
|
Jake Poznanski
|
4ed00d097b
|
Fixes for rich tagging
|
2025-04-30 14:38:35 -07:00 |
|
Jake Poznanski
|
472ee108d7
|
Lints
|
2025-04-30 21:18:59 +00:00 |
|
Jake Poznanski
|
8ef7e56c86
|
Trying a new rich tagging pipeline for PII
|
2025-04-30 21:18:22 +00:00 |
|
Jake Poznanski
|
0a320e9870
|
Some helper scripts for Aman
|
2025-04-30 18:47:10 +00:00 |
|
Jake Poznanski
|
f8808478bd
|
Adding some small changes to the tagging pipeline
|
2025-04-29 11:12:03 -07:00 |
|
Jake Poznanski
|
66d293c178
|
Decent resume/cv tagging
|
2025-04-28 15:57:20 -07:00 |
|
Jake Poznanski
|
1f66b96ffd
|
Adding openai dependecy for benchmarking
|
2025-04-25 18:18:37 +00:00 |
|
Jake Poznanski
|
8ec7dbe2e0
|
Script updates
|
2025-04-25 18:00:41 +00:00 |
|
Jake Poznanski
|
83002a0de7
|
Reinit credentials
|
2025-04-24 20:43:54 +00:00 |
|
Jake Poznanski
|
2d5e1838f4
|
Small corrections
|
2025-04-24 20:31:59 +00:00 |
|
Jake Poznanski
|
df71dc38ce
|
Small fix for cluster usage
|
2025-04-24 20:24:06 +00:00 |
|
Jake Poznanski
|
67a01cfcc8
|
FIxups for tagging pipeline
|
2025-04-24 20:14:42 +00:00 |
|
Jake Poznanski
|
c326fae03c
|
Refactoring tagging bigly
|
2025-04-24 10:18:30 -07:00 |
|
Jake Poznanski
|
479b2c1b2d
|
Working on a tagger
|
2025-04-23 15:54:49 -07:00 |
|
Jake Poznanski
|
717ed811e1
|
Cleanup
|
2025-04-23 14:47:00 -07:00 |
|
Jake Poznanski
|
97ae48c66a
|
Making some more progress
|
2025-04-23 14:46:16 -07:00 |
|
Jake Poznanski
|
7d8e9d181a
|
Fixing up tagging pipeline
|
2025-04-23 19:56:13 +00:00 |
|
Jake Poznanski
|
12100b420d
|
Adding some manual structure to be filled in
|
2025-04-23 18:39:31 +00:00 |
|
Jake Poznanski
|
ee8c506d92
|
Example of a basic empty pipeline that I'm hoping to extend for tagging
|
2025-04-23 18:27:26 +00:00 |
|
mhamada-ai2
|
01644c4a49
|
Update scan_dolmadocs.py
Instruction text updates and public release question update
|
2025-04-22 16:16:21 -07:00 |
|
Jake Poznanski
|
246490f960
|
Lint fixes
|
2025-04-22 21:33:52 +00:00 |
|
Jake Poznanski
|
967210f23b
|
Adjustments to task
|
2025-04-22 21:33:39 +00:00 |
|
Jake Poznanski
|
3dffeeac22
|
Saving prolific PID
|
2025-04-22 21:16:41 +00:00 |
|
Jake Poznanski
|
eabbe279fb
|
Lint fixes
|
2025-04-16 20:14:20 +00:00 |
|
Jake Poznanski
|
e16f66d6c5
|
Working on annotation for dolma docs release
|
2025-04-16 19:29:45 +00:00 |
|
Jake Poznanski
|
9a67f50539
|
Doing some work on annotations again...
|
2025-04-15 22:27:07 +00:00 |
|
Jake Poznanski
|
1d0c560455
|
Upping version to fix issue with work queue and delimited paths
|
2025-04-15 18:50:13 +00:00 |
|
Jake Poznanski
|
786b14aef5
|
Final adjustments
|
2025-04-14 23:27:27 +00:00 |
|
Jake Poznanski
|
4d8a8affdb
|
Adjusting prolific script
|
2025-04-14 23:21:28 +00:00 |
|
Jake Poznanski
|
dc2512c2f0
|
Adjusted annotation script
|
2025-04-14 20:27:06 +00:00 |
|
Jake Poznanski
|
ee41449ff6
|
Instructions updated in annotation tool
|
2025-04-14 19:07:13 +00:00 |
|
Jake Poznanski
|
590a92ec2f
|
Ruff fix
|
2025-04-10 21:50:14 +00:00 |
|
Jake Poznanski
|
a13a50143a
|
Formatting, fixes to annotation tool
|
2025-04-08 22:30:59 +00:00 |
|
Jake Poznanski
|
a74800f528
|
New flowchart based annotation tool
|
2025-04-08 21:04:56 +00:00 |
|
Jake Poznanski
|
cdc7fae4f9
|
Adjusting annotation script
|
2025-04-08 20:50:00 +00:00 |
|
Jake Poznanski
|
474e0ef6ed
|
Lint fixes, adjusting qwen2.5 vl prompt
|
2025-04-07 21:19:36 -07:00 |
|
Jake Poznanski
|
f0d18e8b80
|
Final version for prolific
|
2025-04-07 21:39:55 +00:00 |
|
Jake Poznanski
|
b626b4a1e1
|
Adjusting labeling task
|
2025-04-07 20:27:32 +00:00 |
|
Jake Poznanski
|
3d1925067b
|
Removing progress bar in annotation UI
|
2025-04-04 21:41:36 +00:00 |
|
Jake Poznanski
|
caf21b9664
|
Lints
|
2025-04-04 19:45:38 +00:00 |
|
Jake Poznanski
|
f1188dc85d
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-04-04 19:44:55 +00:00 |
|
Jake Poznanski
|
a0f8b028f8
|
Reporting results
|
2025-04-04 19:44:54 +00:00 |
|
Jake Poznanski
|
cc7b1131c6
|
Editing
|
2025-04-04 19:38:59 +00:00 |
|
Jake Poznanski
|
9338f5359f
|
Saving pdf paths
|
2025-04-04 19:36:10 +00:00 |
|
Jake Poznanski
|
c8cc61b95f
|
Merge pull request #163 from franzbischoff/main
Add script to convert JSONL files to Markdown format
|
2025-04-04 12:30:54 -07:00 |
|
Jake Poznanski
|
61624a37ff
|
Fixed
|
2025-04-04 17:53:26 +00:00 |
|
Jake Poznanski
|
d299119c65
|
Links updated
|
2025-04-04 17:18:41 +00:00 |
|
Jake Poznanski
|
a113fd3015
|
Review app
|
2025-04-04 17:18:19 +00:00 |
|