1501 Commits

Author SHA1 Message Date
Jake Poznanski
b921922f25 Cleaning up some pipeline logs 2025-08-13 17:39:02 +00:00
Jake Poznanski
332a818614 useless config 2025-08-12 17:31:19 +00:00
Jake Poznanski
b873d66dae resumable 2025-08-12 16:35:21 +00:00
Jake Poznanski
98d457c502 2epoch config fix 2025-08-11 22:21:55 +00:00
Jake Poznanski
387e7947c4 Another 2 epoch run 2025-08-06 22:39:09 +00:00
Jake Poznanski
2a3c534a84 2 epoch resumable config 2025-08-06 22:38:38 +00:00
Jake Poznanski
c7a533c945 Sorting data loader samples to maintain consistency between runs 2025-08-06 21:46:13 +00:00
Jake Poznanski
2fca448105 Using new budget code 2025-08-06 16:31:08 +00:00
Jake Poznanski
e664dc5f36 typo 2025-08-05 19:43:11 +00:00
Jake Poznanski
8b8c6bb837 Cleaning up some training requirements installation steps 2025-08-05 19:42:46 +00:00
Jake Poznanski
c9b8088bc6 Adding some preempt flags 2025-08-05 18:00:46 +00:00
Jake Poznanski
8b7006d75d One more thing to try 2025-08-05 17:38:59 +00:00
Jake Poznanski
51ec1d34b2 Adding a bigger config with augemnts 2025-08-05 17:38:00 +00:00
Jake Poznanski
8b595b63ec Adding a decent augmentations pipeline 2025-08-05 17:37:02 +00:00
Jake Poznanski
7dca33db60 Getting things ready for a bit more augmentation 2025-08-05 16:34:46 +00:00
Jake Poznanski
55f8ba0ac0 Fixing configs 2025-08-04 22:54:39 +00:00
Jake Poznanski
c4de7dce80 Dataloader fix for loading blank yamls 2025-08-04 22:42:57 +00:00
Jake Poznanski
3ae173bd72 Merge branch 'main' into jakep/olmocr_v03 2025-08-04 22:28:29 +00:00
Jake Poznanski
12f8a90f1b Copying preprocessed files to local ssd in trainer script 2025-08-04 22:18:38 +00:00
Jake Poznanski
be1f845da4 Fixing issue with blank documents 2025-08-04 21:50:54 +00:00
Jake Poznanski
8715ccd245 Rotation augmentation config 2025-08-04 21:17:40 +00:00
Jake Poznanski
0792c03a9a Ok, rotation augmentation is in 2025-08-04 21:15:36 +00:00
Jake Poznanski
3bc2c0b8e3 Adding batch skipping in data loader 2025-08-04 21:07:04 +00:00
Jake Poznanski
66c7d823b5 Cleaning up some new config files 2025-08-04 20:49:33 +00:00
Jake Poznanski
d7cb315878 Merge branch 'main' into jakep/olmocr_v03 2025-08-04 20:46:56 +00:00
Jake Poznanski
6216896102 Accidentally comitted too many files 2025-08-04 20:41:21 +00:00
Jake Poznanski
4d2ddd3245 Merge branch 'jakep/flip_prompt' into jakep/olmocr_v03 2025-08-04 20:35:40 +00:00
Jake Poznanski
6417b2e8ba Merge branch 'main' of https://github.com/allenai/olmocr v0.2.3 2025-08-04 20:34:02 +00:00
Jake Poznanski
75a8b05255 Bump version to v0.2.3 for release 2025-08-04 20:33:54 +00:00
Jake Poznanski
ea465bebf3
Update README.md 2025-08-04 13:26:52 -07:00
Jake Poznanski
f3aedf2c12 Bumping version 2025-08-04 20:02:55 +00:00
Jake Poznanski
becd15d3cf Reformating fix 2025-08-04 20:01:54 +00:00
Jake Poznanski
d6591c04a1 Saving extra metadata that will be useful for finetuning 2025-08-04 20:01:30 +00:00
Jake Poznanski
7c098955a9 Trying fix for transformers benchmark 2025-08-04 19:50:05 +00:00
Jake Poznanski
8712534e81 Fix for docker ignore 2025-08-04 19:39:55 +00:00
Jake Poznanski
0bdcd4471e Fix docker ignore 2025-08-04 18:58:45 +00:00
Jake Poznanski
0536c0e9b8 Lint fixes 2025-08-04 18:21:47 +00:00
Jake Poznanski
08b263ba46 Cumulative rotation support 2025-08-04 18:21:31 +00:00
Jake Poznanski
5e991b67e5
Merge pull request #291 from haydn-jones/main
Forward unknown args to vLLM
2025-08-04 11:04:46 -07:00
Jake Poznanski
c89b66b8a2 Bump version to v0.2.2 for release v0.2.2 2025-08-04 17:56:09 +00:00
Jake Poznanski
1286f1055d version bump 2025-08-04 17:56:01 +00:00
Jake Poznanski
168953ca84 Lowered memory usage check per #290 2025-08-04 17:54:35 +00:00
Jake Poznanski
ed8a5d10cf Ok fixed rotation stuff finally 2025-08-04 17:53:48 +00:00
Jake Poznanski
e0158df210 Adding test file 2025-08-04 17:21:40 +00:00
Jake Poznanski
6cdcb06ae7 Removing some dead code and adding tests 2025-08-04 16:54:42 +00:00
Jake Poznanski
4d773cce8f Adding pytest asyncio 2025-08-04 16:39:27 +00:00
Jake Poznanski
0ff691910b Lint fix 2025-08-04 16:24:31 +00:00
Jake Poznanski
a8d5299433 Trying to add a test for rotation correction 2025-08-04 16:24:13 +00:00
Jake Poznanski
1255f64d59 Better error messages 2025-08-04 15:33:29 +00:00
Haydn Jones
08c26338a9 Remove unnecessary info 2025-08-03 23:25:57 -04:00