25 Commits

Author SHA1 Message Date
Jake Poznanski
548187902b Ignore 2025-06-12 21:14:00 +00:00
Jake Poznanski
748ab95751 Miner unit tests for duplicate absent tests 2025-04-02 18:12:05 +00:00
Jake Poznanski
fb8b23d506 SMall adjustments to synthetic data pipeline 2025-04-02 17:46:48 +00:00
Jake Poznanski
9855f70fee Some work on table dataset 2025-03-19 17:25:22 +00:00
Jake Poznanski
98c4283eef Cap max workers to hopefully improve stability 2025-03-14 10:08:30 -07:00
Jake Poznanski
980121feea Loading tests much faster in parallel 2025-03-13 10:20:09 -07:00
Jake Poznanski
08f76121c3 Bump version to v0.1.53 for release 2025-02-14 20:55:21 +00:00
Jake Poznanski
6471f28ec8 Random git ignores, remove unused code 2025-02-10 22:00:35 +00:00
Jake Poznanski
b574766977 Viewer and gitignore 2025-01-29 11:46:46 -08:00
Jake Poznanski
e0afb935fa Better check for separate sglang installation step 2025-01-28 13:56:00 -08:00
Jake Poznanski
96ae2dd49b Refactoring 2025-01-27 20:45:28 +00:00
Jake Poznanski
00f2a67ac4 More elo scoring stuff 2025-01-14 22:40:56 +00:00
Jake Poznanski
3e33ce1cde Ignores 2024-12-03 18:50:31 +00:00
Jake Poznanski
ade3580eaf FIxes 2024-11-11 14:38:26 -08:00
Jake Poznanski
232c445a23 Pipeline stability fixes hopefully and logging 2024-10-29 20:15:34 +00:00
Jake Poznanski
a3e7654190 Update all docs at once 2024-10-28 15:06:29 +00:00
Jake Poznanski
d99096e9a2 Adding vllm profile script for reference 2024-10-22 20:00:34 +00:00
Jake Poznanski
23d129fd2c Organizing around a new style of dataloader 2024-10-16 18:06:27 +00:00
Jake Poznanski
96682b2ecb Refactoring 2024-10-16 16:18:27 +00:00
Jake Poznanski
42cf6a639f Dolma viewer 2024-10-15 18:37:31 +00:00
Jake Poznanski
931f48c3d1 Allow eval script to support one more type of jsonls, runpipeline multiglobs, other fixes 2024-10-09 23:39:13 +00:00
Jake Poznanski
4bf6e7a430 Refactoring 2024-10-09 18:11:18 +00:00
Jake Poznanski
0071cbd788 Appears as if the report method works really well, might need one last step to detect rotated pages 2024-10-02 16:44:39 +00:00
Jake Poznanski
256d77c232 Hoping to get a basic hf Trainer to run 2024-09-20 15:53:11 -07:00
Jake Poznanski
68b2c0e8d6
Initial commit 2024-09-17 07:53:43 -07:00