1519 Commits

Author SHA1 Message Date
Jake Poznanski
3c8410f22c fix 2025-08-21 16:51:51 +00:00
Jake Poznanski
1dbb4332c0 FIxing up 2025-08-21 16:50:56 +00:00
Jake Poznanski
7c446e1679 Trying to fix script 2025-08-20 22:44:10 +00:00
Jake Poznanski
a2ee4d46c0 gpro trainer test 1 2025-08-20 22:35:19 +00:00
Jake Poznanski
77164e909f Decent grpo script 2025-08-20 22:25:26 +00:00
Jake Poznanski
cc918ca03e Setting up GRPO trainer 2025-08-20 22:18:38 +00:00
Jake Poznanski
d046ba554a Mining math tests too 2025-08-20 21:05:56 +00:00
Jake Poznanski
c32dced59c More fixes to data gen script 2025-08-20 20:24:38 +00:00
Jake Poznanski
becbfdc62d adding basic markdown output, will need to adjust it 2025-08-20 19:54:21 +00:00
Jake Poznanski
ce86aff80a Refreshing the claude sonnet synth miner 2025-08-20 16:23:33 +00:00
Jake Poznanski
34d7f6e1c5 Preempt 2025-08-19 22:02:14 +00:00
Jake Poznanski
1a4cf6d8e1 A few more nice configs to test 2025-08-19 21:48:49 +00:00
Jake Poznanski
3e6be9ad5f Merge branch 'jakep/new_data_promptv4' into jakep/new_data 2025-08-19 21:46:28 +00:00
Jake Poznanski
9868a63756 Adding a new pipeline 2025-08-19 21:46:12 +00:00
Jake Poznanski
41201b6317 Lints 2025-08-19 21:30:41 +00:00
Jake Poznanski
768cb33937 Better filtering coming in 2025-08-19 21:22:54 +00:00
Jake Poznanski
1cafa779a3 More filtering stages 2025-08-19 20:09:41 +00:00
Jake Poznanski
4d837b7db2 More filter rules 2025-08-19 20:01:42 +00:00
Jake Poznanski
17d131fce0 Some more filtering stuff 2025-08-19 18:54:04 +00:00
Jake Poznanski
a3d23d7de1 Adding a part of code to dataloader so you can see what is getting filtered out of your dataset 2025-08-19 18:45:01 +00:00
Jake Poznanski
84a0c432e7 Adding some filtering rules and tests for them 2025-08-19 18:14:15 +00:00
Jake Poznanski
cd09e190b5 Fixes 2025-08-19 17:50:23 +00:00
Jake Poznanski
798335c88e Setting pipeline touse new prompt too 2025-08-19 17:46:23 +00:00
Jake Poznanski
f2db62b0f8 Train a run with adjusted prompt 2025-08-19 17:45:41 +00:00
Jake Poznanski
1be5cea567 Merge branch 'main' into jakep/new_data 2025-08-19 17:41:45 +00:00
Jake Poznanski
702f8996a9 2epoch 2025-08-16 21:34:16 +00:00
Jake Poznanski
c075f3071f New configs for new data 2025-08-16 17:31:42 +00:00
Jake Poznanski
cffbb82b0b Fix for iabooks 2025-08-16 17:26:51 +00:00
Jake Poznanski
0a9c82927f Adding strip 2025-08-16 17:05:09 +00:00
Jake Poznanski
c492615355 Bump version to v0.3.3 for release v0.3.3 2025-08-15 19:45:17 +00:00
Jake Poznanski
cee12ccc9f New version 2025-08-15 19:45:07 +00:00
Jake Poznanski
76405b53db Lints 2025-08-15 19:44:47 +00:00
Jake Poznanski
69c33abfcc Trying to keep queue loaded more 2025-08-15 18:44:45 +00:00
Jake Poznanski
7c98673972 Pipeline fixes for OMP_NUM_THREADS 2025-08-15 18:30:00 +00:00
Jake Poznanski
b9238b8638 Fix for floaty amount 2025-08-14 22:27:26 +00:00
Jake Poznanski
618777c17e Bump version to v0.3.2 for release v0.3.2 2025-08-14 20:58:11 +00:00
Jake Poznanski
5532493ec8 Pipeline should be improved to limit CPU usage on page renders 2025-08-14 20:57:57 +00:00
Jake Poznanski
3a36ee239d Cleanup 2025-08-14 20:13:52 +00:00
Jake Poznanski
a863d04e6e Cleanup page rendering cpu limits 2025-08-14 20:11:26 +00:00
Jake Poznanski
482030f286 Script to process batch outputs 2025-08-14 19:54:29 +00:00
Jake Poznanski
53c0e57e4a openai batch data writer 2025-08-14 19:40:36 +00:00
Jake Poznanski
6d2c1a646a Olmocr mix to batch format 2025-08-14 18:24:47 +00:00
Jake Poznanski
2049abd8ff prompt stuff 2025-08-14 18:08:43 +00:00
Jake Poznanski
807257f43a Better prompts 2025-08-14 18:04:47 +00:00
Jake Poznanski
1f50a6b6bd Trying out some new prompts 2025-08-14 17:44:56 +00:00
Jake Poznanski
0dd4fe83f4 Bump version to v0.3.1 for release v0.3.1 2025-08-14 16:52:35 +00:00
Jake Poznanski
7e8f9e43d8 New version 2025-08-14 16:50:49 +00:00
Jake Poznanski
7a36c98e26 Merge branch 'main' into jakep/new_data 2025-08-14 16:45:00 +00:00
Jake Poznanski
0a8cd93c0a Better queue managmenet again 2025-08-14 16:37:11 +00:00
Jake Poznanski
38679243d7 Removing extra files 2025-08-14 16:17:59 +00:00