1749 Commits

Author SHA1 Message Date
Jake Poznanski
f3cdc78b4f Pushing new version 2025-08-31 03:12:30 +00:00
Jake Poznanski
72fcfafde7 Fixed up national archives script 2025-08-29 16:36:18 +00:00
Jake Poznanski
7e09a02f7c Better NA extractor 2025-08-28 20:48:51 +00:00
Jake Poznanski
22626a512c Describing national archives project 2025-08-28 18:44:24 +00:00
Jake Poznanski
f426826850 Clean downloads 2025-08-28 18:03:52 +00:00
Jake Poznanski
3d3d184f25 Fixes 2025-08-28 17:56:27 +00:00
Jake Poznanski
ed3820c0c7 LOC downloader 2025-08-28 17:38:20 +00:00
Jake Poznanski
6123d4452b Reorganizing files 2025-08-28 17:04:26 +00:00
Jake Poznanski
0710debf75 Cleaner front matter reward 2025-08-27 19:49:42 +00:00
Jake Poznanski
09036b07d9 Verifying bench loading 2025-08-27 19:00:29 +00:00
Jake Poznanski
edd098093b Reverting version changes that broke, vllm 0.10.1 is not good 2025-08-27 18:55:26 +00:00
Jake Poznanski
14f19e5d58 Some cost tracking 2025-08-27 18:53:42 +00:00
Jake Poznanski
d70208d98a Moving test code around, adding format reward since some runs stop outputting the front matter thing in grpo training 2025-08-27 18:22:05 +00:00
Jake Poznanski
8383865392 Fixing up subscripts and superscripts in synth data 2025-08-27 18:15:36 +00:00
Jake Poznanski
27792664bf Transformers version bump needed also 2025-08-27 16:35:51 +00:00
Jake Poznanski
03c7479a17 VLLM version bump 2025-08-27 16:33:37 +00:00
Jake Poznanski
cf479d1d29 Fixing reward weights stuff 2025-08-26 22:04:45 +00:00
Jake Poznanski
c86b413d3e Adding additional rewards and weights 2025-08-26 21:49:35 +00:00
Jake Poznanski
9c520498dd Adding some losses 2025-08-26 21:18:24 +00:00
Jake Poznanski
3be381b375 Adding some params 2025-08-26 20:46:06 +00:00
Jake Poznanski
4b3660debd Quieting some messages 2025-08-26 20:15:39 +00:00
Jake Poznanski
8327da2415 Fix 2025-08-26 20:06:52 +00:00
Jake Poznanski
98f4d62d1e Fixing data check 2025-08-26 19:51:54 +00:00
Jake Poznanski
3433c8f5f2 Adding more cmd line args 2025-08-26 18:28:41 +00:00
Jake Poznanski
9671f6847c Parallel rewards 2025-08-26 18:16:21 +00:00
Jake Poznanski
3eec58012c Docker ignore 2025-08-26 17:52:50 +00:00
Jake Poznanski
2588f911dc Docker ignore 2025-08-26 17:52:30 +00:00
Jake Poznanski
82fd50263f Launcher for grpo training 2025-08-26 16:28:38 +00:00
Jake Poznanski
6be12c2e06 Baseline tests for blanks 2025-08-25 22:01:24 +00:00
Jake Poznanski
ad33672781 fix 2025-08-25 21:04:53 +00:00
Jake Poznanski
ba226cc1d5 Saving url metadata 2025-08-25 20:47:54 +00:00
Jake Poznanski
90a7443b2b Working to cleanup miner script 2025-08-25 20:41:04 +00:00
Jake Poznanski
ed6f483074 Fixing run_benchmark 2025-08-25 20:28:40 +00:00
Jake Poznanski
d84eb95ba2 Saving some extra data mixes 2025-08-25 20:26:29 +00:00
Jake Poznanski
c7aa217281 Scripts to run benchmarks better 2025-08-25 20:12:10 +00:00
Jake Poznanski
b16e4051f6 Saving bench results to s3 2025-08-25 19:53:55 +00:00
Jake Poznanski
d9b6978499 Some scripts 2025-08-25 18:44:18 +00:00
Jake Poznanski
2415cdcfda Adding support for a blank page output check 2025-08-25 17:20:47 +00:00
Jake Poznanski
33ecc2316c Startng miner for blank pages, we want to make sure we don't hallucinate 2025-08-25 16:54:23 +00:00
Jake Poznanski
55b7101d7e Add some new rotation tests to a branch of the bench 2025-08-25 16:25:00 +00:00
Jake Poznanski
59321af018
Merge pull request #319 from haydn-jones/main
External vLLM instance
2025-08-25 08:43:25 -07:00
Haydn Jones
2c63836648 Black and mock 2025-08-23 20:07:05 -04:00
Jake Poznanski
5c6225b227 Fix for some math equations stuff 2025-08-22 20:52:41 +00:00
Jake Poznanski
d36357f3db Some fixes to validating math which was not working otherwise 2025-08-22 20:40:14 +00:00
Jake Poznanski
f3ea1527ef Local path supported 2025-08-22 20:11:35 +00:00
Jake Poznanski
5cbe331259 Async mode 2025-08-22 19:51:51 +00:00
Jake Poznanski
0df56e958e Even more test cleanup 2025-08-22 18:56:56 +00:00
Jake Poznanski
9831e65161 Fixing some logs, verifying 100% self test accuracy 2025-08-22 18:39:39 +00:00
Jake Poznanski
c1c83fd86c Better quality synth data from both sides 2025-08-22 18:37:59 +00:00
Jake Poznanski
d9789947d5 Refactoring of loading tests 2025-08-22 18:31:37 +00:00