1465 Commits

Author SHA1 Message Date
Jake Poznanski
7b3b93589d VLLM bump 2025-08-14 16:08:45 +00:00
Jake Poznanski
4431b4886f Better tracking of semaphore release on bigger jobs 2025-08-14 16:05:21 +00:00
Jake Poznanski
4efd3f5d9e AI2 Internal budgeting 2025-08-13 22:16:18 +00:00
Jake Poznanski
9f8df232b6 Readme updates 2025-08-13 22:03:03 +00:00
Jake Poznanski
36ca700669 Bump version to v0.3.0 for release v0.3.0 2025-08-13 21:41:30 +00:00
Jake Poznanski
3e5351c028 version bump 2025-08-13 21:41:22 +00:00
Jake Poznanski
894c617ea4
Merge pull request #303 from allenai/jakep/olmocr_v03
olmOCR v.0.3.0
2025-08-13 14:39:54 -07:00
Jake Poznanski
463cef7ea2 New default model 2025-08-13 20:57:15 +00:00
Jake Poznanski
e86267a01c Making local results directory properly 2025-08-13 20:40:04 +00:00
Jake Poznanski
11302feb8c Move open cv2 import only into experimental data loader class 2025-08-13 20:28:31 +00:00
Jake Poznanski
93411a80a0 Lint fixes 2025-08-13 20:21:04 +00:00
Jake Poznanski
05330150ad New work queue code is cleaner 2025-08-13 20:20:27 +00:00
Jake Poznanski
9a8fa335ae One more scheme to try 2025-08-13 18:21:58 +00:00
Jake Poznanski
ffb0c6abc5 Adding some more quant schemes 2025-08-13 18:00:38 +00:00
Jake Poznanski
b921922f25 Cleaning up some pipeline logs 2025-08-13 17:39:02 +00:00
Jake Poznanski
332a818614 useless config 2025-08-12 17:31:19 +00:00
Jake Poznanski
b873d66dae resumable 2025-08-12 16:35:21 +00:00
Jake Poznanski
98d457c502 2epoch config fix 2025-08-11 22:21:55 +00:00
Jake Poznanski
387e7947c4 Another 2 epoch run 2025-08-06 22:39:09 +00:00
Jake Poznanski
2a3c534a84 2 epoch resumable config 2025-08-06 22:38:38 +00:00
Jake Poznanski
c7a533c945 Sorting data loader samples to maintain consistency between runs 2025-08-06 21:46:13 +00:00
Jake Poznanski
2fca448105 Using new budget code 2025-08-06 16:31:08 +00:00
Jake Poznanski
e664dc5f36 typo 2025-08-05 19:43:11 +00:00
Jake Poznanski
8b8c6bb837 Cleaning up some training requirements installation steps 2025-08-05 19:42:46 +00:00
Jake Poznanski
c9b8088bc6 Adding some preempt flags 2025-08-05 18:00:46 +00:00
Jake Poznanski
8b7006d75d One more thing to try 2025-08-05 17:38:59 +00:00
Jake Poznanski
51ec1d34b2 Adding a bigger config with augemnts 2025-08-05 17:38:00 +00:00
Jake Poznanski
8b595b63ec Adding a decent augmentations pipeline 2025-08-05 17:37:02 +00:00
Jake Poznanski
7dca33db60 Getting things ready for a bit more augmentation 2025-08-05 16:34:46 +00:00
Jake Poznanski
55f8ba0ac0 Fixing configs 2025-08-04 22:54:39 +00:00
Jake Poznanski
c4de7dce80 Dataloader fix for loading blank yamls 2025-08-04 22:42:57 +00:00
Jake Poznanski
3ae173bd72 Merge branch 'main' into jakep/olmocr_v03 2025-08-04 22:28:29 +00:00
Jake Poznanski
12f8a90f1b Copying preprocessed files to local ssd in trainer script 2025-08-04 22:18:38 +00:00
Jake Poznanski
be1f845da4 Fixing issue with blank documents 2025-08-04 21:50:54 +00:00
Jake Poznanski
8715ccd245 Rotation augmentation config 2025-08-04 21:17:40 +00:00
Jake Poznanski
0792c03a9a Ok, rotation augmentation is in 2025-08-04 21:15:36 +00:00
Jake Poznanski
3bc2c0b8e3 Adding batch skipping in data loader 2025-08-04 21:07:04 +00:00
Jake Poznanski
66c7d823b5 Cleaning up some new config files 2025-08-04 20:49:33 +00:00
Jake Poznanski
d7cb315878 Merge branch 'main' into jakep/olmocr_v03 2025-08-04 20:46:56 +00:00
Jake Poznanski
6216896102 Accidentally comitted too many files 2025-08-04 20:41:21 +00:00
Jake Poznanski
4d2ddd3245 Merge branch 'jakep/flip_prompt' into jakep/olmocr_v03 2025-08-04 20:35:40 +00:00
Jake Poznanski
6417b2e8ba Merge branch 'main' of https://github.com/allenai/olmocr v0.2.3 2025-08-04 20:34:02 +00:00
Jake Poznanski
75a8b05255 Bump version to v0.2.3 for release 2025-08-04 20:33:54 +00:00
Jake Poznanski
ea465bebf3
Update README.md 2025-08-04 13:26:52 -07:00
Jake Poznanski
f3aedf2c12 Bumping version 2025-08-04 20:02:55 +00:00
Jake Poznanski
becd15d3cf Reformating fix 2025-08-04 20:01:54 +00:00
Jake Poznanski
d6591c04a1 Saving extra metadata that will be useful for finetuning 2025-08-04 20:01:30 +00:00
Jake Poznanski
7c098955a9 Trying fix for transformers benchmark 2025-08-04 19:50:05 +00:00
Jake Poznanski
8712534e81 Fix for docker ignore 2025-08-04 19:39:55 +00:00
Jake Poznanski
0bdcd4471e Fix docker ignore 2025-08-04 18:58:45 +00:00