Commit Graph

  • 1451dd1395 weka jakep/new_trainer Jake Poznanski 2025-06-27 02:57:26 +00:00
  • 680377c93f Example config Jake Poznanski 2025-06-26 23:32:50 +00:00
  • dee3730231 Gantry stuff Jake Poznanski 2025-06-26 18:34:53 +00:00
  • ff1cd02a34 fixed kv-cache-dtype issue amanr/olmocr_kv_fp8 aman-17 2025-06-26 10:49:34 -07:00
  • 94f941627f added things aman-17 2025-06-26 00:20:08 +00:00
  • 0d7836b111 Basic atttempt to run trainer script Jake Poznanski 2025-06-25 23:22:59 +00:00
  • d7e5037192 New trainer launch script cleanups Jake Poznanski 2025-06-25 23:05:32 +00:00
  • 91e7b5ce3f Claude generated train script Jake Poznanski 2025-06-24 22:56:35 +00:00
  • 0ebc35cf1f Basic train config loader for datasets Jake Poznanski 2025-06-24 22:48:36 +00:00
  • b93c262dca Prepping new config stuff Jake Poznanski 2025-06-24 22:40:50 +00:00
  • 633b03d1da Merge branch 'main' of https://github.com/allenai/olmocr main Jake Poznanski 2025-06-24 22:06:02 +00:00
  • 67e9ec873f Removing unused file Jake Poznanski 2025-06-24 22:06:01 +00:00
  • 1df93d0ddf
    Merge pull request #257 from allenai/amanr/nanonets_ocr Aman Rangapur 2025-06-23 16:35:57 -07:00
  • 202e22932e addressed Jake's comment for pagenumbers with \d+ amanr/nanonets_ocr aman-17 2025-06-23 23:29:10 +00:00
  • 9d04b30ea4 added nanonets aman-17 2025-06-23 22:04:47 +00:00
  • 24a2f9b0a4 Bump version to v0.1.76 for release v0.1.76 Jake Poznanski 2025-06-23 21:54:15 +00:00
  • cd93ca5927 Version bump Jake Poznanski 2025-06-23 21:54:06 +00:00
  • ecce181ab9
    Merge pull request #256 from allenai/jakep/dockerfix Aman Rangapur 2025-06-23 14:47:36 -07:00
  • e45e871dc9 Cleanup for docker file jakep/dockerfix Jake Poznanski 2025-06-23 20:05:33 +00:00
  • 52d62201c2 Check more logs jakep/vllm_streaming Jake Poznanski 2025-06-21 22:53:49 +00:00
  • 560d1c77de Oops Jake Poznanski 2025-06-21 18:18:46 +00:00
  • b4b30a0a62 Trying out hynek's repeat detector Jake Poznanski 2025-06-21 18:16:55 +00:00
  • f003933f54 Hmm adjusting params Jake Poznanski 2025-06-21 04:04:12 +00:00
  • c85324b936 Testing repeat detects Jake Poznanski 2025-06-20 22:14:06 +00:00
  • bf88dbae4d SIlly streaming Jake Poznanski 2025-06-18 04:14:32 +00:00
  • 0c6d1990dc
    Update README.md Jake Poznanski 2025-06-17 14:45:11 -07:00
  • ec5c5b6444 Updating pareto plots Jake Poznanski 2025-06-17 21:41:23 +00:00
  • 6c51829ae6 Some helper scripts Jake Poznanski 2025-06-17 21:21:50 +00:00
  • 626952a786 Adding news Jake Poznanski 2025-06-17 20:06:33 +00:00
  • 9d260791a0 README updates Jake Poznanski 2025-06-17 19:58:06 +00:00
  • 69524cb305 Updatinge bench readme Jake Poznanski 2025-06-17 19:55:17 +00:00
  • 069a99ea5f Bump version to v0.1.75 for release v0.1.75 Jake Poznanski 2025-06-17 19:07:48 +00:00
  • b61ef52d36 Final cleanup Jake Poznanski 2025-06-17 19:07:37 +00:00
  • e103634d30 Bump version to v0.1.74 for release v0.1.74 Jake Poznanski 2025-06-17 18:02:23 +00:00
  • 9f44fa6e79 Cleanup CI Jake Poznanski 2025-06-17 18:02:13 +00:00
  • 0b07fa4722 Increasing max number of workers, more suitable to H100 configurations and having retries Jake Poznanski 2025-06-17 18:01:16 +00:00
  • 3c1cd6a504 Bump version to v0.1.73 for release v0.1.73 Jake Poznanski 2025-06-17 17:07:02 +00:00
  • dd1d0b561b More debug logs Jake Poznanski 2025-06-17 17:06:45 +00:00
  • 08ca1544bf Beaker push fix Jake Poznanski 2025-06-17 16:59:22 +00:00
  • 5e5c31b93e Bump version to v0.1.72 for release v0.1.72 Jake Poznanski 2025-06-17 16:21:43 +00:00
  • 715b841596 0.1.72 Jake Poznanski 2025-06-17 16:21:36 +00:00
  • b03feb3356 Fixed Jake Poznanski 2025-06-17 16:10:38 +00:00
  • b588ae27d2 Remvoing sglang tests, switch to vllm Jake Poznanski 2025-06-17 16:07:16 +00:00
  • 6e3fba3c59 Lints Jake Poznanski 2025-06-17 16:06:40 +00:00
  • e489b28421 Lints jakep/vllm_perf Jake Poznanski 2025-06-17 15:58:16 +00:00
  • 6fcd26d66a Updating readme Jake Poznanski 2025-06-17 15:48:25 +00:00
  • 7b510e97bc No anchoring vllm_perf_noanchor Jake Poznanski 2025-06-17 15:30:43 +00:00
  • 8c62072832 Merge remote-tracking branch 'origin/main' into jakep/vllm_perf Jake Poznanski 2025-06-17 15:25:32 +00:00
  • 571518b25b
    Update mypy requirement from <1.5,>=1.0 to >=1.0,<1.17 dependabot/pip/mypy-gte-1.0-and-lt-1.17 dependabot[bot] 2025-06-17 00:22:22 +00:00
  • 5c778e9a37 Unlocking number of workers for now jakep/vllm_integral Jake Poznanski 2025-06-13 22:41:28 +00:00
  • 050ab845ad Max tokens fix Jake Poznanski 2025-06-13 22:20:51 +00:00
  • 6719a4e8b3 Adding logging Jake Poznanski 2025-06-13 22:13:38 +00:00
  • 34ee4f7427 VLLM Jake Poznanski 2025-06-13 22:05:34 +00:00
  • 7156209396 Vllm fix Jake Poznanski 2025-06-13 22:00:05 +00:00
  • 90bcd53479 Vllm fix Jake Poznanski 2025-06-13 21:45:59 +00:00
  • 00a11cc488 Vllm config fixes Jake Poznanski 2025-06-13 21:38:36 +00:00
  • c99c327af5 Some fixes hopefully Jake Poznanski 2025-06-13 21:35:50 +00:00
  • 8c1408a0fe Crazy idea to run vllm in async local mode instead of a server should be more reliable Jake Poznanski 2025-06-13 21:29:57 +00:00
  • e9828cde51 Lints, adding more perf tracking to pipeline Jake Poznanski 2025-06-13 19:53:34 +00:00
  • 9ab742b7c8 Outputting finished output tok/sec as well Jake Poznanski 2025-06-13 03:53:33 +00:00
  • cc0c62ab73 Adding more workers by default to improve bench perf Jake Poznanski 2025-06-13 03:50:21 +00:00
  • 1295e171bb Merge branch 'main' of https://github.com/allenai/olmocr Jake Poznanski 2025-06-12 22:35:09 +00:00
  • 37090e2801 Go back to workers 1 in marker test script Jake Poznanski 2025-06-12 22:35:08 +00:00
  • f273de6e6e
    Update README.md Jake Poznanski 2025-06-12 15:32:09 -07:00
  • af02f15f24
    Merge pull request #236 from VikParuchuri/main Jake Poznanski 2025-06-12 15:24:17 -07:00
  • 3da6e2d587 Pareto plot update, keep cost the same for now Jake Poznanski 2025-06-12 22:23:41 +00:00
  • b9f3e7c72b updated sglang amanr/sglang_latest aman-17 2025-06-12 22:11:07 +00:00
  • fcd8bbec92 Install aws cli Jake Poznanski 2025-06-12 21:38:28 +00:00
  • fc06797bec aws cli Jake Poznanski 2025-06-12 21:29:39 +00:00
  • 59e0a1ccb0 Marker wants newer torchvision Jake Poznanski 2025-06-12 21:23:53 +00:00
  • 0f3b45c1a3 Add time Jake Poznanski 2025-06-12 21:19:17 +00:00
  • 4bfcfce767 Actually install the right thing Jake Poznanski 2025-06-12 21:18:58 +00:00
  • 548187902b Ignore Jake Poznanski 2025-06-12 21:14:00 +00:00
  • f8dfd85765 Script Jake Poznanski 2025-06-12 21:13:31 +00:00
  • 044874a634 Adding marker benchmark Jake Poznanski 2025-06-12 21:12:58 +00:00
  • 9787d007b9 Pulling in bigger benchmark script from vllm branch to main Jake Poznanski 2025-06-12 21:02:46 +00:00
  • 43c94fea58 Bencharmk update Jake Poznanski 2025-06-12 20:47:58 +00:00
  • b1e064f8a6 Run benchmark script will also start a job to convert 10k docs from olmocr-mix to check performance Jake Poznanski 2025-06-12 20:27:50 +00:00
  • 3d72f3457b Fixing prepare_olmocrmix Jake Poznanski 2025-06-12 20:15:35 +00:00
  • af7aaef605 Run marker script Jake Poznanski 2025-06-12 20:07:17 +00:00
  • cbc4580b72 Fixing #240 Jake Poznanski 2025-06-12 17:21:21 +00:00
  • c93ac4a95d Cleaned up loader Jake Poznanski 2025-06-12 03:27:39 +00:00
  • 60338810bc Cleaning up dataloader Jake Poznanski 2025-06-12 03:17:24 +00:00
  • 51b03b7b4c downgraded pytorch==2.4.0 amanr/vllm_test aman-17 2025-06-11 20:05:33 -07:00
  • f9c633fe03 vllm==0.6.3 aman-17 2025-06-11 20:04:52 -07:00
  • ce95e42b5b updated transformers 4.46.2 aman-17 2025-06-11 19:17:18 -07:00
  • dc07c0bfdc vllm=0.6.4 aman-17 2025-06-11 18:45:46 -07:00
  • 621d0d20e0 vllm 0.6.6 aman-17 2025-06-11 17:50:11 -07:00
  • 36fb3f701f updated the vllm to 0.7.2 aman-17 2025-06-11 16:45:25 -07:00
  • cfe9aa102b Ok, dataloader from start to finish is running, now to write a trainer Jake Poznanski 2025-06-11 23:30:02 +00:00
  • 3d86b5a2f6 removed torch whl file aman-17 2025-06-11 16:27:46 -07:00
  • 105d5907d6 Dataloader progress Jake Poznanski 2025-06-11 22:35:35 +00:00
  • 9f50bda6bf More refactoring Jake Poznanski 2025-06-11 22:05:56 +00:00
  • 6a360fae06 Cleanup Jake Poznanski 2025-06-11 21:55:07 +00:00
  • d17bef8b4b Working on a more pipeliney thing Jake Poznanski 2025-06-11 21:51:24 +00:00
  • d0df380ae9 Cleaning data loader Jake Poznanski 2025-06-11 21:41:18 +00:00
  • 5bbc1ffff7 Parsing and validating front matter Jake Poznanski 2025-06-11 21:27:57 +00:00
  • aedc295e3f Image params to loader Jake Poznanski 2025-06-11 21:05:23 +00:00
  • f3736253f3 updated torch version aman-17 2025-06-11 13:06:27 -07:00
  • 8681910570 test vllm 0.8.1 aman-17 2025-06-11 13:04:35 -07:00