288 Commits

Author SHA1 Message Date
Jake Poznanski
e99fb07ae0 Better docker docs 2025-11-17 20:50:37 +00:00
Jake Poznanski
e9a3e48636 Lints 2025-11-03 19:28:15 +00:00
Jake Poznanski
070e110b08 Some plot fixes 2025-11-03 19:27:34 +00:00
Jake Poznanski
8a6866ed63 Fixing up some plots for a presentation 2025-11-03 18:25:25 +00:00
Jake Poznanski
4ee20056c0 Adding a flag to run_benchmark script 2025-10-27 18:12:00 +00:00
Jake Poznanski
10ab6a60e0 Adding suppor for smaller error bars in benchmarking 2025-10-24 16:49:49 +00:00
Jake Poznanski
15496b973f Infrapartner benchmarking script 2025-10-23 21:49:23 +00:00
Jake Poznanski
5c16f52d3b Paddle vl benchmark runner saves off data 2025-10-21 20:09:39 +00:00
Jake Poznanski
76e05f8165 Fixes, adding more runners 2025-10-20 17:11:12 +00:00
Jake Poznanski
e796448482 Adding paddlevl script 2025-10-20 16:26:08 +00:00
Jake Poznanski
9480508642 Mineru 2025-10-13 20:47:52 +00:00
Jake Poznanski
417fbed4ad Fix 2025-10-13 19:46:27 +00:00
Jake Poznanski
7d6db61446 Mineru runner 2025-10-13 19:43:39 +00:00
Jake Poznanski
875337f962 Lints 2025-10-09 22:12:19 +00:00
Jake Poznanski
702c42f8e7 Packaging working better now 2025-10-09 22:12:02 +00:00
Jake Poznanski
74eb910b95 Now you can just run pytest . cleanly 2025-10-09 20:31:28 +00:00
Jake Poznanski
f01f7183e4 Test fixes 2025-10-09 20:28:29 +00:00
Jake Poznanski
8ef68fde88 Merge branch 'main' into jakep/new_data 2025-10-07 17:44:54 +00:00
Jake Poznanski
2e3d1a0317 Comitting test script to be used in model cards for individual one-off inference 2025-10-06 22:47:06 +00:00
Jake Poznanski
8ef7f8085a isort and black 2025-09-30 17:37:10 +00:00
Jake Poznanski
fb1ef9e38a Release script fix 2025-09-29 17:37:14 +00:00
Jake Poznanski
c587eb9050 Ugh, release script adds all files by default 2025-09-29 17:36:41 +00:00
Jake Poznanski
7f4b728dcd Skip docker checks if using beaker image 2025-09-26 23:27:04 +00:00
Jake Poznanski
bb06829840 SOuping in fp32 2025-09-26 20:03:29 +00:00
Jake Poznanski
01bc1ff7b6 Allow passing in beaker image to run benchmark 2025-09-23 19:48:42 +00:00
Jake Poznanski
a00d9d172e Adding stricter math and table tests when in synthetic mode 2025-09-23 18:37:50 +00:00
Jake Poznanski
1197c35808 Mix contamination checker script 2025-09-23 18:17:13 +00:00
Jake Poznanski
83d965c768 Adding contam check for olmocr-bench when making synth data 2025-09-20 03:42:42 +00:00
Jake Poznanski
1ac72ad169 Adding some scripts to clean data 2025-09-18 19:44:30 +00:00
Jake Poznanski
54cd5a3438 Going to train on the new transcripts data 2025-09-08 22:30:40 +00:00
Jake Poznanski
ef09c73bf2 Fixing up some rewards stuff 2025-09-04 17:34:53 +00:00
Jake Poznanski
ede0dc51b1 Adding drop last to prevent any weirdnesses 2025-09-04 16:50:08 +00:00
Jake Poznanski
14a882db9a Fixing to new version, adjusting scale rewards stuff 2025-09-03 22:43:35 +00:00
Jake Poznanski
755c221024 Trying some more things 2025-09-03 22:11:16 +00:00
Jake Poznanski
a41d04660a Cleaning script 2025-09-03 21:31:21 +00:00
Jake Poznanski
e6cff25b6b Cleanup stuff 2025-09-03 20:34:12 +00:00
Jake Poznanski
bade86fe91 Cleaned up things 2025-09-03 20:23:01 +00:00
Jake Poznanski
b689a8e5f8 Giving more memory buffer 2025-09-03 19:56:53 +00:00
Jake Poznanski
7346d12322 Better cleaning, augusta version 2025-09-03 18:47:02 +00:00
Jake Poznanski
f20f1a0b54 Doing some cleaning 2025-09-03 18:41:36 +00:00
Jake Poznanski
94d19c51c6 Cleaning up scripts, multi gpu trainer more flexible 2025-09-03 18:25:10 +00:00
Jake Poznanski
c612293a59 Remove device map auto 2025-09-03 18:04:42 +00:00
Jake Poznanski
1fb49cefc1 Working on multi gpu trainer 2025-09-03 17:25:14 +00:00
Jake Poznanski
3be381b375 Adding some params 2025-08-26 20:46:06 +00:00
Jake Poznanski
82fd50263f Launcher for grpo training 2025-08-26 16:28:38 +00:00
Jake Poznanski
ad33672781 fix 2025-08-25 21:04:53 +00:00
Jake Poznanski
ed6f483074 Fixing run_benchmark 2025-08-25 20:28:40 +00:00
Jake Poznanski
d84eb95ba2 Saving some extra data mixes 2025-08-25 20:26:29 +00:00
Jake Poznanski
c7aa217281 Scripts to run benchmarks better 2025-08-25 20:12:10 +00:00
Jake Poznanski
b16e4051f6 Saving bench results to s3 2025-08-25 19:53:55 +00:00