Jake Poznanski
|
3f9fc8bd1b
|
Better compressor hopefully
|
2025-07-15 18:08:17 +00:00 |
|
Jake Poznanski
|
287c8278f5
|
Starting to cleanup and merge yaml front matter stuff in
|
2025-07-15 18:00:01 +00:00 |
|
Jake Poznanski
|
1092213c5f
|
Merge branch 'jakep/new_traininer_nojson_newprompt' into jakep/new_trainer
|
2025-07-15 17:44:55 +00:00 |
|
Jake Poznanski
|
679063aba5
|
Adding some more logging to compressor
|
2025-07-15 17:42:33 +00:00 |
|
Jake Poznanski
|
43ae28dde4
|
Prepare checkpoint works for older models too
|
2025-07-14 21:30:32 +00:00 |
|
Jake Poznanski
|
f306a52fe1
|
Compress fix
|
2025-07-14 21:16:48 +00:00 |
|
Jake Poznanski
|
f014c2aaf9
|
Need to reserve all 8 gpus for reliable performance benchmark, even if you only use 1
|
2025-07-14 21:02:14 +00:00 |
|
Jake Poznanski
|
01360ba21d
|
Compressor script
|
2025-07-14 20:56:51 +00:00 |
|
Jake Poznanski
|
1ede76d0b2
|
Cleaning up compress and prepare checkpoint scripts
|
2025-07-14 20:36:20 +00:00 |
|
Jake Poznanski
|
2674162d02
|
New prompt test
|
2025-07-14 17:35:29 +00:00 |
|
Jake Poznanski
|
a5a0cd7478
|
Trying a few more configs
|
2025-07-11 20:19:48 +00:00 |
|
Jake Poznanski
|
384a1b19c7
|
Qwen 2 config too
|
2025-07-11 17:20:59 +00:00 |
|
Jake Poznanski
|
24a3fb87e8
|
128batch config, wsd config
|
2025-07-11 17:19:02 +00:00 |
|
Jake Poznanski
|
0c773c40af
|
Let's do a 1280 no anchor yaml
|
2025-07-10 21:44:39 +00:00 |
|
Jake Poznanski
|
65d0edcaae
|
Adding guided decoding option
|
2025-07-10 15:13:26 +00:00 |
|
Jake Poznanski
|
da5f8f2f78
|
wsd config
|
2025-07-10 01:13:54 +00:00 |
|
Jake Poznanski
|
336b000416
|
Adding wsd as an option
|
2025-07-09 22:35:57 +00:00 |
|
Jake Poznanski
|
69581cca23
|
More config fixes
|
2025-07-09 17:59:59 +00:00 |
|
Jake Poznanski
|
ca8e503870
|
Ugh, lost some training runs because files got saved to the wrong place
|
2025-07-09 17:57:34 +00:00 |
|
Jake Poznanski
|
02f0706edc
|
Reverting back to json pipeline as it seems better by default
|
2025-07-09 17:46:54 +00:00 |
|
Luca G
|
073cdd066b
|
Expose --gpu_memory_utilization / --max_model_len flags and startup hint
|
2025-07-05 10:42:52 +02:00 |
|
Jake Poznanski
|
8ae9104bb3
|
Calling it with a new name
|
2025-07-03 23:04:58 +00:00 |
|
Jake Poznanski
|
3976cee141
|
Adding 8192 cap on day2 config
|
2025-07-03 23:04:29 +00:00 |
|
Jake Poznanski
|
ca2609cb52
|
No doc anchoring version
|
2025-07-03 18:24:16 +00:00 |
|
Jake Poznanski
|
560a585523
|
Configs with proper names
|
2025-07-03 18:12:09 +00:00 |
|
Jake Poznanski
|
53cc1a0ba9
|
Fixed json configuration
|
2025-07-03 18:01:28 +00:00 |
|
Jake Poznanski
|
2c54c6d06c
|
ALlow unicode in json
|
2025-07-03 16:43:51 +00:00 |
|
Jake Poznanski
|
b1ab9964ee
|
Day 2 json config
|
2025-07-03 16:42:17 +00:00 |
|
Jake Poznanski
|
a1c2ee82a6
|
More workers by default
|
2025-07-03 16:36:07 +00:00 |
|
Jake Poznanski
|
d26ae4bb4d
|
Easier way to test configs
|
2025-07-03 16:30:25 +00:00 |
|
Jake Poznanski
|
a7e2f719bf
|
Start a preemptible one at least once
|
2025-07-02 19:26:30 +00:00 |
|
Jake Poznanski
|
6d6476b31a
|
One idea for resume fix
|
2025-07-02 01:33:37 +00:00 |
|
Jake Poznanski
|
2a20607d37
|
Get rid of fused
|
2025-07-02 01:13:32 +00:00 |
|
Jake Poznanski
|
59f11c7e2e
|
Better names
|
2025-07-02 01:05:40 +00:00 |
|
Jake Poznanski
|
210d170b15
|
Adding a standard JSON output option
|
2025-07-01 22:13:06 +00:00 |
|
Jake Poznanski
|
6f2a426986
|
Fresh prompt configs
|
2025-07-01 21:24:58 +00:00 |
|
Jake Poznanski
|
5e8017b5cd
|
Oops
|
2025-07-01 21:15:45 +00:00 |
|
Jake Poznanski
|
4a6ef91b5e
|
Matching old trainer config
|
2025-07-01 21:15:17 +00:00 |
|
Jake Poznanski
|
5e2f703ee6
|
Trying some config changes
|
2025-07-01 21:01:34 +00:00 |
|
Jake Poznanski
|
94d7900887
|
Default configs are better
|
2025-07-01 20:36:06 +00:00 |
|
Jake Poznanski
|
56e51ea23a
|
Improving regex even more
|
2025-07-01 20:35:57 +00:00 |
|
Jake Poznanski
|
98df1d5fb7
|
Adding max length option
|
2025-07-01 20:22:59 +00:00 |
|
Jake Poznanski
|
abdc907a3c
|
Pipeline fix
|
2025-07-01 20:03:02 +00:00 |
|
Jake Poznanski
|
e691ea176c
|
Better regex for structured decoding, adding some new prompts to train with
|
2025-07-01 18:12:32 +00:00 |
|
Jake Poznanski
|
a651cf0ca6
|
Adding guided regex decoder
|
2025-07-01 17:44:02 +00:00 |
|
Jake Poznanski
|
748e2ae9eb
|
With yaml formatted responses, make sure response finishes with code stop
|
2025-07-01 17:31:21 +00:00 |
|
Jake Poznanski
|
9bf8e9e0fa
|
Preparing pipeline for new format
|
2025-07-01 17:01:33 +00:00 |
|
Jake Poznanski
|
c6c1fbd0eb
|
Better prepare checkpoint script
|
2025-07-01 16:44:19 +00:00 |
|
Jake Poznanski
|
8dcfdd0418
|
Checkpoint prep tool
|
2025-07-01 16:34:29 +00:00 |
|
Jake Poznanski
|
c029ccdbfb
|
Added a few more configs to try
|
2025-07-01 01:46:53 +00:00 |
|