Jake Poznanski
|
768cb33937
|
Better filtering coming in
|
2025-08-19 21:22:54 +00:00 |
|
Jake Poznanski
|
1cafa779a3
|
More filtering stages
|
2025-08-19 20:09:41 +00:00 |
|
Jake Poznanski
|
4d837b7db2
|
More filter rules
|
2025-08-19 20:01:42 +00:00 |
|
Jake Poznanski
|
17d131fce0
|
Some more filtering stuff
|
2025-08-19 18:54:04 +00:00 |
|
Jake Poznanski
|
a3d23d7de1
|
Adding a part of code to dataloader so you can see what is getting filtered out of your dataset
|
2025-08-19 18:45:01 +00:00 |
|
Jake Poznanski
|
84a0c432e7
|
Adding some filtering rules and tests for them
|
2025-08-19 18:14:15 +00:00 |
|
Jake Poznanski
|
cd09e190b5
|
Fixes
|
2025-08-19 17:50:23 +00:00 |
|
Jake Poznanski
|
798335c88e
|
Setting pipeline touse new prompt too
|
2025-08-19 17:46:23 +00:00 |
|
Jake Poznanski
|
f2db62b0f8
|
Train a run with adjusted prompt
|
2025-08-19 17:45:41 +00:00 |
|
Jake Poznanski
|
1be5cea567
|
Merge branch 'main' into jakep/new_data
|
2025-08-19 17:41:45 +00:00 |
|
Jake Poznanski
|
702f8996a9
|
2epoch
|
2025-08-16 21:34:16 +00:00 |
|
Jake Poznanski
|
c075f3071f
|
New configs for new data
|
2025-08-16 17:31:42 +00:00 |
|
Jake Poznanski
|
cffbb82b0b
|
Fix for iabooks
|
2025-08-16 17:26:51 +00:00 |
|
Jake Poznanski
|
0a9c82927f
|
Adding strip
|
2025-08-16 17:05:09 +00:00 |
|
Jake Poznanski
|
c492615355
|
Bump version to v0.3.3 for release
v0.3.3
|
2025-08-15 19:45:17 +00:00 |
|
Jake Poznanski
|
cee12ccc9f
|
New version
|
2025-08-15 19:45:07 +00:00 |
|
Jake Poznanski
|
76405b53db
|
Lints
|
2025-08-15 19:44:47 +00:00 |
|
Jake Poznanski
|
69c33abfcc
|
Trying to keep queue loaded more
|
2025-08-15 18:44:45 +00:00 |
|
Jake Poznanski
|
7c98673972
|
Pipeline fixes for OMP_NUM_THREADS
|
2025-08-15 18:30:00 +00:00 |
|
Jake Poznanski
|
b9238b8638
|
Fix for floaty amount
|
2025-08-14 22:27:26 +00:00 |
|
Jake Poznanski
|
618777c17e
|
Bump version to v0.3.2 for release
v0.3.2
|
2025-08-14 20:58:11 +00:00 |
|
Jake Poznanski
|
5532493ec8
|
Pipeline should be improved to limit CPU usage on page renders
|
2025-08-14 20:57:57 +00:00 |
|
Jake Poznanski
|
3a36ee239d
|
Cleanup
|
2025-08-14 20:13:52 +00:00 |
|
Jake Poznanski
|
a863d04e6e
|
Cleanup page rendering cpu limits
|
2025-08-14 20:11:26 +00:00 |
|
Jake Poznanski
|
482030f286
|
Script to process batch outputs
|
2025-08-14 19:54:29 +00:00 |
|
Jake Poznanski
|
53c0e57e4a
|
openai batch data writer
|
2025-08-14 19:40:36 +00:00 |
|
Jake Poznanski
|
6d2c1a646a
|
Olmocr mix to batch format
|
2025-08-14 18:24:47 +00:00 |
|
Jake Poznanski
|
2049abd8ff
|
prompt stuff
|
2025-08-14 18:08:43 +00:00 |
|
Jake Poznanski
|
807257f43a
|
Better prompts
|
2025-08-14 18:04:47 +00:00 |
|
Jake Poznanski
|
1f50a6b6bd
|
Trying out some new prompts
|
2025-08-14 17:44:56 +00:00 |
|
Jake Poznanski
|
0dd4fe83f4
|
Bump version to v0.3.1 for release
v0.3.1
|
2025-08-14 16:52:35 +00:00 |
|
Jake Poznanski
|
7e8f9e43d8
|
New version
|
2025-08-14 16:50:49 +00:00 |
|
Jake Poznanski
|
7a36c98e26
|
Merge branch 'main' into jakep/new_data
|
2025-08-14 16:45:00 +00:00 |
|
Jake Poznanski
|
0a8cd93c0a
|
Better queue managmenet again
|
2025-08-14 16:37:11 +00:00 |
|
Jake Poznanski
|
38679243d7
|
Removing extra files
|
2025-08-14 16:17:59 +00:00 |
|
Jake Poznanski
|
dc5c45e144
|
Deps
|
2025-08-14 16:10:29 +00:00 |
|
Jake Poznanski
|
7b3b93589d
|
VLLM bump
|
2025-08-14 16:08:45 +00:00 |
|
Jake Poznanski
|
4431b4886f
|
Better tracking of semaphore release on bigger jobs
|
2025-08-14 16:05:21 +00:00 |
|
Jake Poznanski
|
4efd3f5d9e
|
AI2 Internal budgeting
|
2025-08-13 22:16:18 +00:00 |
|
Jake Poznanski
|
9f8df232b6
|
Readme updates
|
2025-08-13 22:03:03 +00:00 |
|
Jake Poznanski
|
36ca700669
|
Bump version to v0.3.0 for release
v0.3.0
|
2025-08-13 21:41:30 +00:00 |
|
Jake Poznanski
|
3e5351c028
|
version bump
|
2025-08-13 21:41:22 +00:00 |
|
Jake Poznanski
|
894c617ea4
|
Merge pull request #303 from allenai/jakep/olmocr_v03
olmOCR v.0.3.0
|
2025-08-13 14:39:54 -07:00 |
|
Jake Poznanski
|
463cef7ea2
|
New default model
|
2025-08-13 20:57:15 +00:00 |
|
Jake Poznanski
|
e86267a01c
|
Making local results directory properly
|
2025-08-13 20:40:04 +00:00 |
|
Jake Poznanski
|
11302feb8c
|
Move open cv2 import only into experimental data loader class
|
2025-08-13 20:28:31 +00:00 |
|
Jake Poznanski
|
93411a80a0
|
Lint fixes
|
2025-08-13 20:21:04 +00:00 |
|
Jake Poznanski
|
05330150ad
|
New work queue code is cleaner
|
2025-08-13 20:20:27 +00:00 |
|
Jake Poznanski
|
9a8fa335ae
|
One more scheme to try
|
2025-08-13 18:21:58 +00:00 |
|
Jake Poznanski
|
ffb0c6abc5
|
Adding some more quant schemes
|
2025-08-13 18:00:38 +00:00 |
|