1501 Commits

Author SHA1 Message Date
Jake Poznanski
b61ef52d36 Final cleanup 2025-06-17 19:07:37 +00:00
Jake Poznanski
e103634d30 Bump version to v0.1.74 for release v0.1.74 2025-06-17 18:02:23 +00:00
Jake Poznanski
9f44fa6e79 Cleanup CI 2025-06-17 18:02:13 +00:00
Jake Poznanski
0b07fa4722 Increasing max number of workers, more suitable to H100 configurations and having retries 2025-06-17 18:01:16 +00:00
Jake Poznanski
3c1cd6a504 Bump version to v0.1.73 for release v0.1.73 2025-06-17 17:07:02 +00:00
Jake Poznanski
dd1d0b561b More debug logs 2025-06-17 17:06:45 +00:00
Jake Poznanski
08ca1544bf Beaker push fix 2025-06-17 16:59:22 +00:00
Jake Poznanski
5e5c31b93e Bump version to v0.1.72 for release v0.1.72 2025-06-17 16:21:43 +00:00
Jake Poznanski
715b841596 0.1.72 2025-06-17 16:21:36 +00:00
Jake Poznanski
b03feb3356 Fixed 2025-06-17 16:10:38 +00:00
Jake Poznanski
b588ae27d2 Remvoing sglang tests, switch to vllm 2025-06-17 16:07:16 +00:00
Jake Poznanski
6e3fba3c59 Lints 2025-06-17 16:06:40 +00:00
Jake Poznanski
e489b28421 Lints 2025-06-17 15:58:16 +00:00
Jake Poznanski
6fcd26d66a Updating readme 2025-06-17 15:48:25 +00:00
Jake Poznanski
8c62072832 Merge remote-tracking branch 'origin/main' into jakep/vllm_perf 2025-06-17 15:25:32 +00:00
Jake Poznanski
e9828cde51 Lints, adding more perf tracking to pipeline 2025-06-13 19:53:34 +00:00
Jake Poznanski
9ab742b7c8 Outputting finished output tok/sec as well 2025-06-13 03:53:33 +00:00
Jake Poznanski
cc0c62ab73 Adding more workers by default to improve bench perf 2025-06-13 03:50:21 +00:00
Jake Poznanski
1295e171bb Merge branch 'main' of https://github.com/allenai/olmocr 2025-06-12 22:35:09 +00:00
Jake Poznanski
37090e2801 Go back to workers 1 in marker test script 2025-06-12 22:35:08 +00:00
Jake Poznanski
f273de6e6e
Update README.md
Updating to v.1.7.5 marker that I ran locally with base only for now
2025-06-12 15:32:09 -07:00
Jake Poznanski
af02f15f24
Merge pull request #236 from VikParuchuri/main
Fix marker benchmarks
2025-06-12 15:24:17 -07:00
Jake Poznanski
3da6e2d587 Pareto plot update, keep cost the same for now 2025-06-12 22:23:41 +00:00
Jake Poznanski
fcd8bbec92 Install aws cli 2025-06-12 21:38:28 +00:00
Jake Poznanski
fc06797bec aws cli 2025-06-12 21:29:39 +00:00
Jake Poznanski
59e0a1ccb0 Marker wants newer torchvision 2025-06-12 21:23:53 +00:00
Jake Poznanski
0f3b45c1a3 Add time 2025-06-12 21:19:17 +00:00
Jake Poznanski
4bfcfce767 Actually install the right thing 2025-06-12 21:18:58 +00:00
Jake Poznanski
548187902b Ignore 2025-06-12 21:14:00 +00:00
Jake Poznanski
f8dfd85765 Script 2025-06-12 21:13:31 +00:00
Jake Poznanski
044874a634 Adding marker benchmark 2025-06-12 21:12:58 +00:00
Jake Poznanski
9787d007b9 Pulling in bigger benchmark script from vllm branch to main 2025-06-12 21:02:46 +00:00
Jake Poznanski
43c94fea58 Bencharmk update 2025-06-12 20:47:58 +00:00
Jake Poznanski
b1e064f8a6 Run benchmark script will also start a job to convert 10k docs from olmocr-mix to check performance 2025-06-12 20:27:50 +00:00
Jake Poznanski
3d72f3457b Fixing prepare_olmocrmix 2025-06-12 20:15:35 +00:00
Jake Poznanski
af7aaef605 Run marker script 2025-06-12 20:07:17 +00:00
Jake Poznanski
cbc4580b72 Fixing #240 2025-06-12 17:21:21 +00:00
Jake Poznanski
c93ac4a95d Cleaned up loader 2025-06-12 03:27:39 +00:00
Jake Poznanski
60338810bc Cleaning up dataloader 2025-06-12 03:17:24 +00:00
Jake Poznanski
cfe9aa102b Ok, dataloader from start to finish is running, now to write a trainer 2025-06-11 23:30:02 +00:00
Jake Poznanski
105d5907d6 Dataloader progress 2025-06-11 22:35:35 +00:00
Jake Poznanski
9f50bda6bf More refactoring 2025-06-11 22:05:56 +00:00
Jake Poznanski
6a360fae06 Cleanup 2025-06-11 21:55:07 +00:00
Jake Poznanski
d17bef8b4b Working on a more pipeliney thing 2025-06-11 21:51:24 +00:00
Jake Poznanski
d0df380ae9 Cleaning data loader 2025-06-11 21:41:18 +00:00
Jake Poznanski
5bbc1ffff7 Parsing and validating front matter 2025-06-11 21:27:57 +00:00
Jake Poznanski
aedc295e3f Image params to loader 2025-06-11 21:05:23 +00:00
Jake Poznanski
9a390e3d58 Validating that we get single pages 2025-06-11 18:14:36 +00:00
Jake Poznanski
0689676026 Rendering the pdfs in the dataloader 2025-06-11 18:11:42 +00:00
Jake Poznanski
352287cc16 Starting on dataloader 2025-06-11 17:54:19 +00:00