1362 Commits

Author SHA1 Message Date
Jake Poznanski
e7020c7f50 More configs 2025-06-30 22:49:42 +00:00
Jake Poznanski
7cf98794fd Image 1600 configuration 2025-06-30 22:49:13 +00:00
Jake Poznanski
d2ef9d78f1 Four basic training configs for new version 2025-06-30 22:31:02 +00:00
Jake Poznanski
a3ad61bd4d Small config updates 2025-06-30 22:22:49 +00:00
Jake Poznanski
ee8bd9b220 Better resume logic I hope 2025-06-30 22:18:15 +00:00
Jake Poznanski
208fabcb69 Validating on procespool 2025-06-30 22:10:59 +00:00
Jake Poznanski
4f46f10e0c At least get resuming from checkpoints to work perhaps 2025-06-30 21:56:12 +00:00
Jake Poznanski
2375079758 Torch compile off, gives warnings and no speed boost, padding to do multi batch is not working either 2025-06-30 21:47:17 +00:00
Jake Poznanski
c11120a3fa Trying to do batch size > 1 2025-06-30 21:37:50 +00:00
Jake Poznanski
5c2d69a3d7 Some cleanup stuff 2025-06-30 21:24:35 +00:00
Jake Poznanski
e86511e11b Weka fix 2025-06-30 17:46:13 +00:00
Jake Poznanski
656dbef833 Frontier configs 2025-06-30 17:43:30 +00:00
Jake Poznanski
e2f2d36e4f More typos 2025-06-30 17:41:19 +00:00
Jake Poznanski
ea72ea2645 Ugh stupid fix 2025-06-30 17:40:19 +00:00
Jake Poznanski
55a737ca6b script 2025-06-30 17:32:01 +00:00
Jake Poznanski
ba49fd53d9 frontier train script let's see what happens 2025-06-30 17:30:17 +00:00
Jake Poznanski
bde6f2955e Bf16 only 2025-06-30 17:25:53 +00:00
Jake Poznanski
44dd966850 Wandb fixes 2025-06-30 17:23:47 +00:00
Jake Poznanski
f8071c7457 Loss config 2025-06-30 17:17:48 +00:00
Jake Poznanski
a3997419b3 Naming config entries better 2025-06-30 17:15:58 +00:00
Jake Poznanski
44cba7911b Bench katex files distributed in installation package now 2025-06-30 17:07:38 +00:00
Jake Poznanski
f09b9ff142
Merge pull request #264 from boyuan99/fix-installation-command-typo
Fix typo in pip install command for GPU setup
2025-06-30 09:52:27 -07:00
Jake Poznanski
8e5e18f54c Checking that anchor text works for each pdf page when initializing dataloader 2025-06-30 16:29:33 +00:00
Bo Yuan
333f029ffb Fix typo in pip install command for GPU setup
Remove incorrect period before [gpu] in pip install command.
The correct syntax is 'olmocr[gpu]' not 'olmocr.[gpu]'.
2025-06-30 01:06:21 -05:00
Jake Poznanski
dc7fff5bf7 Collator fix 2025-06-29 19:52:53 +00:00
Jake Poznanski
12b5cc3101 Lowwering size of default data load for testing 2025-06-28 23:09:44 +00:00
Jake Poznanski
c36b5df2af Cleanup collator 2025-06-28 22:46:12 +00:00
Jake Poznanski
887190e961 Cleanup 2025-06-27 21:54:31 +00:00
Jake Poznanski
330f465d5d Small fixes 2025-06-27 21:53:06 +00:00
Jake Poznanski
214c44df36 Reporting to wandb, better eval dataset loading 2025-06-27 21:16:22 +00:00
Jake Poznanski
600d967fe6 Config changes 2025-06-27 19:55:04 +00:00
Jake Poznanski
850b598db1 Sdpa 2025-06-27 16:59:33 +00:00
Jake Poznanski
573219d246 Lint 2025-06-27 16:43:50 +00:00
Jake Poznanski
eab7492e60 Forwarding -tp and -dp options 2025-06-27 16:40:47 +00:00
Jake Poznanski
14b9b2dc8f Fix for rocm vllm 2025-06-27 16:23:31 +00:00
Jake Poznanski
b96454b786 Merge branch 'main' into jakep/new_trainer 2025-06-27 16:21:58 +00:00
Jake Poznanski
58e4fadfc0 torchvision requirement 2025-06-27 16:16:19 +00:00
Jake Poznanski
1451dd1395 weka 2025-06-27 02:57:26 +00:00
Jake Poznanski
680377c93f Example config 2025-06-26 23:32:50 +00:00
Jake Poznanski
dee3730231 Gantry stuff 2025-06-26 18:34:53 +00:00
Jake Poznanski
0d7836b111 Basic atttempt to run trainer script 2025-06-25 23:22:59 +00:00
Jake Poznanski
d7e5037192 New trainer launch script cleanups 2025-06-25 23:05:32 +00:00
Jake Poznanski
91e7b5ce3f Claude generated train script 2025-06-24 22:56:35 +00:00
Jake Poznanski
0ebc35cf1f Basic train config loader for datasets 2025-06-24 22:48:36 +00:00
Jake Poznanski
b93c262dca Prepping new config stuff 2025-06-24 22:40:50 +00:00
Jake Poznanski
633b03d1da Merge branch 'main' of https://github.com/allenai/olmocr 2025-06-24 22:06:02 +00:00
Jake Poznanski
67e9ec873f Removing unused file 2025-06-24 22:06:01 +00:00
Aman Rangapur
1df93d0ddf
Merge pull request #257 from allenai/amanr/nanonets_ocr
Added Nanonets OCR bench results
2025-06-23 16:35:57 -07:00
aman-17
202e22932e addressed Jake's comment for pagenumbers with \d+ 2025-06-23 23:29:10 +00:00
aman-17
9d04b30ea4 added nanonets 2025-06-23 22:04:47 +00:00