1362 Commits

Author SHA1 Message Date
Jake Poznanski
cfe9aa102b Ok, dataloader from start to finish is running, now to write a trainer 2025-06-11 23:30:02 +00:00
Jake Poznanski
105d5907d6 Dataloader progress 2025-06-11 22:35:35 +00:00
Jake Poznanski
9f50bda6bf More refactoring 2025-06-11 22:05:56 +00:00
Jake Poznanski
6a360fae06 Cleanup 2025-06-11 21:55:07 +00:00
Jake Poznanski
d17bef8b4b Working on a more pipeliney thing 2025-06-11 21:51:24 +00:00
Jake Poznanski
d0df380ae9 Cleaning data loader 2025-06-11 21:41:18 +00:00
Jake Poznanski
5bbc1ffff7 Parsing and validating front matter 2025-06-11 21:27:57 +00:00
Jake Poznanski
aedc295e3f Image params to loader 2025-06-11 21:05:23 +00:00
Jake Poznanski
9a390e3d58 Validating that we get single pages 2025-06-11 18:14:36 +00:00
Jake Poznanski
0689676026 Rendering the pdfs in the dataloader 2025-06-11 18:11:42 +00:00
Jake Poznanski
352287cc16 Starting on dataloader 2025-06-11 17:54:19 +00:00
Jake Poznanski
0e17b50583 Ok, looks like we have a nice extractor script for the dataset 2025-06-11 17:28:00 +00:00
Jake Poznanski
f19f7c1271 Almost done extracting 2025-06-11 17:17:52 +00:00
Jake Poznanski
f0d8ff7bd3 First attempt at new trainer code 2025-06-11 16:56:16 +00:00
aman-17
3eda2c04c1 updated vllm to 0.9.1 2025-06-10 16:14:57 -07:00
Jake Poznanski
a83a0da65f Cleanup of vllm perf branch with @amanr 2025-06-10 21:56:05 +00:00
aman-17
316d0af1cd added dtype functionality 2025-06-06 16:19:40 -07:00
aman-17
c8a5361d1b fixing packages of 22.04 2025-06-06 13:50:12 -07:00
aman-17
c5d075c63a fixed apt_pkg module 2025-06-06 13:48:48 -07:00
aman-17
08fd82f323 made changes wrt ubuntu 22.04 2025-06-06 13:41:10 -07:00
aman-17
6507a657be updated ubuntu to 22.04 for glbc 2.32 2025-06-06 13:29:51 -07:00
Jake Poznanski
25dfe0b831 Weird glibc error 2025-06-06 18:53:52 +00:00
Jake Poznanski
0257444720 Ok, cleaner retry pattern for model downloading 2025-06-06 18:52:01 +00:00
Vik Paruchuri
267f52bd79 Update marker cost 2025-06-06 13:53:50 -04:00
Jake Poznanski
9539eab840 AWs creds fix 2025-06-06 17:45:17 +00:00
Jake Poznanski
e0fda1a77d Passing aws creds to benchmark so we can run custom models stored in s3 2025-06-05 17:40:14 +00:00
Jake Poznanski
ecf0d48a28 Dont allow uncomitted changes 2025-06-05 17:22:12 +00:00
Jake Poznanski
134bba9fcd Run benchmark adjustments 2025-06-05 17:21:06 +00:00
Jake Poznanski
7009a7a9d9 Trying out FP8 compression 2025-06-05 17:18:20 +00:00
Jake Poznanski
9ffbe8df46 Adding quick stats percentage done check 2025-06-05 15:58:19 +00:00
Vik Paruchuri
f21ff08c2f Fix marker benchmarks 2025-06-04 23:10:14 -07:00
Jake Poznanski
aad8428dc3 Reverting custom pipeline image 2025-06-02 23:05:48 +00:00
Jake Poznanski
5c52e016e6 Include cuda 12.8 2025-06-02 22:52:28 +00:00
Jake Poznanski
5c524b53ac Cleaning up stats reportng 2025-06-02 21:40:14 +00:00
Jake Poznanski
916f0cb919 Trying with flash infer installed 2025-06-02 21:23:04 +00:00
Jake Poznanski
2ccef7d760 Ugh, this code is bad 2025-06-02 21:22:25 +00:00
Jake Poznanski
2f1957b401 Performance fixes with vllm backend 2025-06-02 21:10:30 +00:00
Jake Poznanski
d71703317d Fixing parse for waiting 2025-06-02 20:05:57 +00:00
Jake Poznanski
d1baa517b7 Python alternatives 2025-06-02 18:59:28 +00:00
Jake Poznanski
581915ffba Fixes for docker image 2025-06-02 18:47:34 +00:00
Jake Poznanski
153f1e58b7 Final uv fixes 2025-06-02 18:39:32 +00:00
Jake Poznanski
97da87a3b2 Hopefully a much better dockerfile 2025-06-02 18:34:47 +00:00
Jake Poznanski
04dd71c6bf Trying to get onto vllm latest 2025-06-02 18:13:22 +00:00
Jake Poznanski
106070dd0e Moving pipeline to vllm 2025-06-02 18:07:31 +00:00
Jake Poznanski
2235b82c8e Beaker tests 2025-05-30 19:49:34 +00:00
Jake Poznanski
967c83d8e7 Better way to setup beaker 2025-05-30 19:42:27 +00:00
Jake Poznanski
23f4a0e460 Bump version to v0.1.71 for release v0.1.71 2025-05-30 18:56:44 +00:00
Jake Poznanski
8b4f6cd621 Upping version 2025-05-30 18:56:34 +00:00
Jake Poznanski
24b6822153 Pushing beaker images now too 2025-05-30 18:56:02 +00:00
Jake Poznanski
208c29d34b Not including fallbacks in olmocr_pipeline bench runner so we can measure direct model performance better 2025-05-30 18:45:55 +00:00