9 Commits

Author SHA1 Message Date
Jake Poznanski
446773dbc8 First part of new dataloader 2024-10-16 11:54:06 -07:00
Jake Poznanski
ec09408ca9 Filtering based on cpu count 2024-10-07 15:40:29 -07:00
Jake Poznanski
3d36545fa5 loading fix for parquets again... 2024-10-07 14:48:53 -07:00
Jake Poznanski
7416b42023 Adding support for parquet datasets which are precached 2024-10-07 21:14:33 +00:00
Jake Poznanski
4557a5b296 Typo 2024-10-07 13:03:31 -07:00
Jake Poznanski
e973de7ba9 Typo 2024-10-07 13:01:43 -07:00
Jake Poznanski
ebd40f9084 Hopefully fixing dataloader for now 2024-10-07 12:59:27 -07:00
Jake Poznanski
b340ae5092 A few notes, starting to test dataloader with new structured response format 2024-10-02 22:17:15 +00:00
Jake Poznanski
7d2c447dd3 Importing core training config stuff from dolma refine 2024-09-19 21:55:07 +00:00