1501 Commits

Author SHA1 Message Date
Jake Poznanski
da5f8f2f78 wsd config 2025-07-10 01:13:54 +00:00
Jake Poznanski
336b000416 Adding wsd as an option 2025-07-09 22:35:57 +00:00
Jake Poznanski
69581cca23 More config fixes 2025-07-09 17:59:59 +00:00
Jake Poznanski
ca8e503870 Ugh, lost some training runs because files got saved to the wrong place 2025-07-09 17:57:34 +00:00
Jake Poznanski
02f0706edc Reverting back to json pipeline as it seems better by default 2025-07-09 17:46:54 +00:00
Luca G
073cdd066b Expose --gpu_memory_utilization / --max_model_len flags and startup hint 2025-07-05 10:42:52 +02:00
Jake Poznanski
8ae9104bb3 Calling it with a new name 2025-07-03 23:04:58 +00:00
Jake Poznanski
3976cee141 Adding 8192 cap on day2 config 2025-07-03 23:04:29 +00:00
Jake Poznanski
ca2609cb52 No doc anchoring version 2025-07-03 18:24:16 +00:00
Jake Poznanski
560a585523 Configs with proper names 2025-07-03 18:12:09 +00:00
Jake Poznanski
53cc1a0ba9 Fixed json configuration 2025-07-03 18:01:28 +00:00
Jake Poznanski
2c54c6d06c ALlow unicode in json 2025-07-03 16:43:51 +00:00
Jake Poznanski
b1ab9964ee Day 2 json config 2025-07-03 16:42:17 +00:00
Jake Poznanski
a1c2ee82a6 More workers by default 2025-07-03 16:36:07 +00:00
Jake Poznanski
d26ae4bb4d Easier way to test configs 2025-07-03 16:30:25 +00:00
Jake Poznanski
a7e2f719bf Start a preemptible one at least once 2025-07-02 19:26:30 +00:00
Jake Poznanski
6d6476b31a One idea for resume fix 2025-07-02 01:33:37 +00:00
Jake Poznanski
2a20607d37 Get rid of fused 2025-07-02 01:13:32 +00:00
Jake Poznanski
59f11c7e2e Better names 2025-07-02 01:05:40 +00:00
Jake Poznanski
210d170b15 Adding a standard JSON output option 2025-07-01 22:13:06 +00:00
Jake Poznanski
6f2a426986 Fresh prompt configs 2025-07-01 21:24:58 +00:00
Jake Poznanski
5e8017b5cd Oops 2025-07-01 21:15:45 +00:00
Jake Poznanski
4a6ef91b5e Matching old trainer config 2025-07-01 21:15:17 +00:00
Jake Poznanski
5e2f703ee6 Trying some config changes 2025-07-01 21:01:34 +00:00
Jake Poznanski
94d7900887 Default configs are better 2025-07-01 20:36:06 +00:00
Jake Poznanski
56e51ea23a Improving regex even more 2025-07-01 20:35:57 +00:00
Jake Poznanski
98df1d5fb7 Adding max length option 2025-07-01 20:22:59 +00:00
Jake Poznanski
abdc907a3c Pipeline fix 2025-07-01 20:03:02 +00:00
Jake Poznanski
e691ea176c Better regex for structured decoding, adding some new prompts to train with 2025-07-01 18:12:32 +00:00
Jake Poznanski
a651cf0ca6 Adding guided regex decoder 2025-07-01 17:44:02 +00:00
Jake Poznanski
748e2ae9eb With yaml formatted responses, make sure response finishes with code stop 2025-07-01 17:31:21 +00:00
Jake Poznanski
9bf8e9e0fa Preparing pipeline for new format 2025-07-01 17:01:33 +00:00
Jake Poznanski
c6c1fbd0eb Better prepare checkpoint script 2025-07-01 16:44:19 +00:00
Jake Poznanski
8dcfdd0418 Checkpoint prep tool 2025-07-01 16:34:29 +00:00
Jake Poznanski
c029ccdbfb Added a few more configs to try 2025-07-01 01:46:53 +00:00
Jake Poznanski
79a7818517 New trainer launch script for beaker 2025-07-01 01:43:38 +00:00
Jake Poznanski
dcf026a63c Better script 2025-07-01 01:40:55 +00:00
Jake Poznanski
9f0f912101 Ugh 2025-06-30 23:37:35 +00:00
Jake Poznanski
1d007d1bf2 Perhaps fixing default config 2025-06-30 22:58:21 +00:00
Jake Poznanski
e7020c7f50 More configs 2025-06-30 22:49:42 +00:00
Jake Poznanski
7cf98794fd Image 1600 configuration 2025-06-30 22:49:13 +00:00
Jake Poznanski
d2ef9d78f1 Four basic training configs for new version 2025-06-30 22:31:02 +00:00
Jake Poznanski
a3ad61bd4d Small config updates 2025-06-30 22:22:49 +00:00
Jake Poznanski
ee8bd9b220 Better resume logic I hope 2025-06-30 22:18:15 +00:00
Jake Poznanski
208fabcb69 Validating on procespool 2025-06-30 22:10:59 +00:00
Jake Poznanski
4f46f10e0c At least get resuming from checkpoints to work perhaps 2025-06-30 21:56:12 +00:00
Jake Poznanski
2375079758 Torch compile off, gives warnings and no speed boost, padding to do multi batch is not working either 2025-06-30 21:47:17 +00:00
Jake Poznanski
c11120a3fa Trying to do batch size > 1 2025-06-30 21:37:50 +00:00
Jake Poznanski
5c2d69a3d7 Some cleanup stuff 2025-06-30 21:24:35 +00:00
Jake Poznanski
e86511e11b Weka fix 2025-06-30 17:46:13 +00:00