1191 Commits

Author SHA1 Message Date
Jake Poznanski
25dfe0b831 Weird glibc error 2025-06-06 18:53:52 +00:00
Jake Poznanski
0257444720 Ok, cleaner retry pattern for model downloading 2025-06-06 18:52:01 +00:00
Vik Paruchuri
267f52bd79 Update marker cost 2025-06-06 13:53:50 -04:00
Jake Poznanski
9539eab840 AWs creds fix 2025-06-06 17:45:17 +00:00
Jake Poznanski
e0fda1a77d Passing aws creds to benchmark so we can run custom models stored in s3 2025-06-05 17:40:14 +00:00
Jake Poznanski
ecf0d48a28 Dont allow uncomitted changes 2025-06-05 17:22:12 +00:00
Jake Poznanski
134bba9fcd Run benchmark adjustments 2025-06-05 17:21:06 +00:00
Jake Poznanski
7009a7a9d9 Trying out FP8 compression 2025-06-05 17:18:20 +00:00
Jake Poznanski
9ffbe8df46 Adding quick stats percentage done check 2025-06-05 15:58:19 +00:00
Vik Paruchuri
f21ff08c2f Fix marker benchmarks 2025-06-04 23:10:14 -07:00
Jake Poznanski
aad8428dc3 Reverting custom pipeline image 2025-06-02 23:05:48 +00:00
Jake Poznanski
5c52e016e6 Include cuda 12.8 2025-06-02 22:52:28 +00:00
Jake Poznanski
5c524b53ac Cleaning up stats reportng 2025-06-02 21:40:14 +00:00
Jake Poznanski
916f0cb919 Trying with flash infer installed 2025-06-02 21:23:04 +00:00
Jake Poznanski
2ccef7d760 Ugh, this code is bad 2025-06-02 21:22:25 +00:00
Jake Poznanski
2f1957b401 Performance fixes with vllm backend 2025-06-02 21:10:30 +00:00
Jake Poznanski
d71703317d Fixing parse for waiting 2025-06-02 20:05:57 +00:00
Jake Poznanski
d1baa517b7 Python alternatives 2025-06-02 18:59:28 +00:00
Jake Poznanski
581915ffba Fixes for docker image 2025-06-02 18:47:34 +00:00
Jake Poznanski
153f1e58b7 Final uv fixes 2025-06-02 18:39:32 +00:00
Jake Poznanski
97da87a3b2 Hopefully a much better dockerfile 2025-06-02 18:34:47 +00:00
Jake Poznanski
04dd71c6bf Trying to get onto vllm latest 2025-06-02 18:13:22 +00:00
Jake Poznanski
106070dd0e Moving pipeline to vllm 2025-06-02 18:07:31 +00:00
Jake Poznanski
2235b82c8e Beaker tests 2025-05-30 19:49:34 +00:00
Jake Poznanski
967c83d8e7 Better way to setup beaker 2025-05-30 19:42:27 +00:00
Jake Poznanski
23f4a0e460 Bump version to v0.1.71 for release v0.1.71 2025-05-30 18:56:44 +00:00
Jake Poznanski
8b4f6cd621 Upping version 2025-05-30 18:56:34 +00:00
Jake Poznanski
24b6822153 Pushing beaker images now too 2025-05-30 18:56:02 +00:00
Jake Poznanski
208c29d34b Not including fallbacks in olmocr_pipeline bench runner so we can measure direct model performance better 2025-05-30 18:45:55 +00:00
Jake Poznanski
5faf570e30 Format fixes 2025-05-29 23:23:02 +00:00
Jake Poznanski
587b73f23e Try with more aggressive anchor changing 2025-05-29 22:33:16 +00:00
Jake Poznanski
8f5d5bdf28 Revert "Trying to add repetition penalty"
This reverts commit 90f754e7b182f5978f60f5e4734f6ebb0aa3e735.
2025-05-29 21:59:23 +00:00
Jake Poznanski
90f754e7b1 Trying to add repetition penalty 2025-05-29 21:27:13 +00:00
Jake Poznanski
9dcdef6ca3 Going to try with up to 5k tokens 2025-05-29 20:34:05 +00:00
Jake Poznanski
8d92620d3c Merge remote-tracking branch 'origin/main' into retry_improvements 2025-05-29 20:33:45 +00:00
Jake Poznanski
cd5b524d20 Some benchmark cleanup 2025-05-29 20:32:25 +00:00
Jake Poznanski
2cb14cceae ALlowing more tokens 2025-05-29 19:59:58 +00:00
Jake Poznanski
022be37723 Some better info strings in benchmark runner 2025-05-29 18:43:27 +00:00
Jake Poznanski
22ee068d88 Merge remote-tracking branch 'origin/main' into retry_improvements 2025-05-29 18:25:10 +00:00
Jake Poznanski
faddf44897
Merge pull request #218 from allenai/amanr/benchmark_automation
Updated Dockerfile and added a workspace_to_benchmark file
2025-05-29 11:24:30 -07:00
Jake Poznanski
01c4a561d3 Script fixes 2025-05-29 17:58:11 +00:00
Jake Poznanski
129412cdb0 Git lfs for more reliable downloads 2025-05-29 17:38:00 +00:00
Jake Poznanski
45e0ae59dc omg 2025-05-29 17:21:58 +00:00
Jake Poznanski
15e0064212 More fixes 2025-05-29 17:20:32 +00:00
Jake Poznanski
e8e6b6cb17 More fixes 2025-05-29 17:19:36 +00:00
Jake Poznanski
06988ac533 Image fixes 2025-05-29 17:18:12 +00:00
Jake Poznanski
ff31faebe4 Runner improvements 2025-05-29 17:12:41 +00:00
Jake Poznanski
475cc1c3a4 Working on runner script 2025-05-29 17:08:05 +00:00
Jake Poznanski
8347e384fd I think this fixes up the docker file 2025-05-29 16:12:06 +00:00
Jake Poznanski
fbcd82ad30 Cleanup attempt lookup code a bit 2025-05-29 16:01:26 +00:00