123 Commits

Author SHA1 Message Date
Jake Poznanski
500bd2de5b flash attn 2024-10-30 22:33:10 +00:00
Jake Poznanski
d45b34fdd5 Trust remote code 2024-10-30 21:22:39 +00:00
Jake Poznanski
8f001bf74c Config updates 2024-10-30 14:02:57 -07:00
Jake Poznanski
6a4a55f9e0 Hopefully working molmo HF trainer config 2024-10-30 14:00:27 -07:00
Jake Poznanski
bede854cd5 Startng to write molmo formatters 2024-10-30 13:24:11 -07:00
Jake Poznanski
e65747e591 Some better logging 2024-10-30 11:22:52 -07:00
Jake Poznanski
43aa4f2508 Proper selection of LORA weights 2024-10-30 10:42:53 -07:00
Jake Poznanski
bcb47946e5 Starting on molmo changes 2024-10-30 08:39:48 -07:00
Jake Poznanski
f13d0a5741 List configs to list 2024-10-24 03:07:32 +00:00
Jake Poznanski
180dde03c5 dataprep sampling tests 2024-10-23 22:53:05 +00:00
Jake Poznanski
64041bd6d7 Allow sampling different anchor text lens 2024-10-23 15:37:23 -07:00
Jake Poznanski
6a22900b8a Allow for sampling anchor and other params 2024-10-23 22:26:12 +00:00
Jake Poznanski
f44dbd15ef Small fixes 2024-10-21 16:45:06 +00:00
Jake Poznanski
a4822718ea train more steps 2024-10-19 14:12:44 +00:00
Jake Poznanski
c9ac48bd9d Try to save at the last second only 2024-10-19 02:07:57 +00:00
Jake Poznanski
3ecbeae6dc Trying save to s3 but with threaded saver 2024-10-17 21:39:01 +00:00
Jake Poznanski
89fcff233a Fixing saving bug again 2024-10-17 20:37:28 +00:00
Jake Poznanski
529d51d57d Put LR back, need to save larger checkpoints to weka to prevent timeouts 2024-10-17 19:46:25 +00:00
Jake Poznanski
e141c91e5e Try lora run higher LR 2024-10-17 17:12:35 +00:00
Jake Poznanski
124aaf5fe0 Hmm, cant repro failing anchor case 2024-10-17 17:00:02 +00:00
Jake Poznanski
1c42a08d06 Fixes to prevent errors later in dataloading 2024-10-17 02:28:43 +00:00
Jake Poznanski
f13bcad943 Adding check that pdfs are valid in the new anchor text generation format 2024-10-16 23:31:40 +00:00
Jake Poznanski
5018d591f6 will try lower lr 2024-10-16 23:27:00 +00:00
Jake Poznanski
5c36c22bf7 Prepping for more training 2024-10-16 23:01:40 +00:00
Jake Poznanski
277723fa2c Adding cache 2024-10-16 21:18:52 +00:00
Jake Poznanski
87182ab573 Ensuring unique names 2024-10-16 20:44:23 +00:00
Jake Poznanski
4884b8288b Full dataset 2024-10-16 13:30:25 -07:00
Jake Poznanski
51f1669451 fix 2024-10-16 13:30:06 -07:00
Jake Poznanski
d94713e73e Truncation handled in a custom collator 2024-10-16 13:28:12 -07:00
Jake Poznanski
cbc667ce78 Prepping to train 2024-10-16 13:18:24 -07:00
Jake Poznanski
9d647b13b8 fix 2024-10-16 11:58:35 -07:00
Jake Poznanski
446773dbc8 First part of new dataloader 2024-10-16 11:54:06 -07:00
Jake Poznanski
d4f64ed82a Config work 2024-10-16 18:37:52 +00:00
Jake Poznanski
3c1b7de293 Refactoring of train dataloaders 2024-10-16 18:26:25 +00:00
Jake Poznanski
23d129fd2c Organizing around a new style of dataloader 2024-10-16 18:06:27 +00:00
Jake Poznanski
fc8fcfaeba Fixing dataloader hopefully 2024-10-15 15:13:25 +00:00
Jake Poznanski
7b161533e2 Code to do local inference on fine tuned models for testing 2024-10-14 08:38:18 -07:00
Jake Poznanski
2dccc4be3b Oops removing print 2024-10-11 16:23:14 +00:00
Jake Poznanski
a8b50ae8fa Preloading the datasets directly 2024-10-10 19:57:51 +00:00
Jake Poznanski
2864f907e1 Dataloader fix with nicer tests 2024-10-10 16:58:45 +00:00
Jake Poznanski
7c19a9a856 fix 2024-10-08 23:54:17 +00:00
Jake Poznanski
ad10add6c1 try lower lr 2024-10-08 23:52:56 +00:00
Jake Poznanski
230c8a9f9a Trying new run that will rewrite the prompts as it goes 2024-10-08 22:10:18 +00:00
Jake Poznanski
085937859f Lower lr 2024-10-08 17:52:00 +00:00
Jake Poznanski
f5fd9ff53a Trying grad checkpoint 2024-10-08 16:11:31 +00:00
Jake Poznanski
fb4e585e9f Trying out non-lora training 2024-10-08 15:20:37 +00:00
Jake Poznanski
ec09408ca9 Filtering based on cpu count 2024-10-07 15:40:29 -07:00
Jake Poznanski
a90eb94951 Fix dataloader bug 2024-10-07 15:25:48 -07:00
Jake Poznanski
3d36545fa5 loading fix for parquets again... 2024-10-07 14:48:53 -07:00
Jake Poznanski
fdcd77eadd typo 2024-10-07 14:32:47 -07:00