Jake Poznanski
|
e9828cde51
|
Lints, adding more perf tracking to pipeline
|
2025-06-13 19:53:34 +00:00 |
|
Jake Poznanski
|
9ab742b7c8
|
Outputting finished output tok/sec as well
|
2025-06-13 03:53:33 +00:00 |
|
Jake Poznanski
|
cc0c62ab73
|
Adding more workers by default to improve bench perf
|
2025-06-13 03:50:21 +00:00 |
|
Jake Poznanski
|
1295e171bb
|
Merge branch 'main' of https://github.com/allenai/olmocr
|
2025-06-12 22:35:09 +00:00 |
|
Jake Poznanski
|
37090e2801
|
Go back to workers 1 in marker test script
|
2025-06-12 22:35:08 +00:00 |
|
Jake Poznanski
|
f273de6e6e
|
Update README.md
Updating to v.1.7.5 marker that I ran locally with base only for now
|
2025-06-12 15:32:09 -07:00 |
|
Jake Poznanski
|
af02f15f24
|
Merge pull request #236 from VikParuchuri/main
Fix marker benchmarks
|
2025-06-12 15:24:17 -07:00 |
|
Jake Poznanski
|
3da6e2d587
|
Pareto plot update, keep cost the same for now
|
2025-06-12 22:23:41 +00:00 |
|
Jake Poznanski
|
fcd8bbec92
|
Install aws cli
|
2025-06-12 21:38:28 +00:00 |
|
Jake Poznanski
|
fc06797bec
|
aws cli
|
2025-06-12 21:29:39 +00:00 |
|
Jake Poznanski
|
59e0a1ccb0
|
Marker wants newer torchvision
|
2025-06-12 21:23:53 +00:00 |
|
Jake Poznanski
|
0f3b45c1a3
|
Add time
|
2025-06-12 21:19:17 +00:00 |
|
Jake Poznanski
|
4bfcfce767
|
Actually install the right thing
|
2025-06-12 21:18:58 +00:00 |
|
Jake Poznanski
|
548187902b
|
Ignore
|
2025-06-12 21:14:00 +00:00 |
|
Jake Poznanski
|
f8dfd85765
|
Script
|
2025-06-12 21:13:31 +00:00 |
|
Jake Poznanski
|
044874a634
|
Adding marker benchmark
|
2025-06-12 21:12:58 +00:00 |
|
Jake Poznanski
|
9787d007b9
|
Pulling in bigger benchmark script from vllm branch to main
|
2025-06-12 21:02:46 +00:00 |
|
Jake Poznanski
|
43c94fea58
|
Bencharmk update
|
2025-06-12 20:47:58 +00:00 |
|
Jake Poznanski
|
b1e064f8a6
|
Run benchmark script will also start a job to convert 10k docs from olmocr-mix to check performance
|
2025-06-12 20:27:50 +00:00 |
|
Jake Poznanski
|
3d72f3457b
|
Fixing prepare_olmocrmix
|
2025-06-12 20:15:35 +00:00 |
|
Jake Poznanski
|
af7aaef605
|
Run marker script
|
2025-06-12 20:07:17 +00:00 |
|
Jake Poznanski
|
cbc4580b72
|
Fixing #240
|
2025-06-12 17:21:21 +00:00 |
|
Jake Poznanski
|
c93ac4a95d
|
Cleaned up loader
|
2025-06-12 03:27:39 +00:00 |
|
Jake Poznanski
|
60338810bc
|
Cleaning up dataloader
|
2025-06-12 03:17:24 +00:00 |
|
Jake Poznanski
|
cfe9aa102b
|
Ok, dataloader from start to finish is running, now to write a trainer
|
2025-06-11 23:30:02 +00:00 |
|
Jake Poznanski
|
105d5907d6
|
Dataloader progress
|
2025-06-11 22:35:35 +00:00 |
|
Jake Poznanski
|
9f50bda6bf
|
More refactoring
|
2025-06-11 22:05:56 +00:00 |
|
Jake Poznanski
|
6a360fae06
|
Cleanup
|
2025-06-11 21:55:07 +00:00 |
|
Jake Poznanski
|
d17bef8b4b
|
Working on a more pipeliney thing
|
2025-06-11 21:51:24 +00:00 |
|
Jake Poznanski
|
d0df380ae9
|
Cleaning data loader
|
2025-06-11 21:41:18 +00:00 |
|
Jake Poznanski
|
5bbc1ffff7
|
Parsing and validating front matter
|
2025-06-11 21:27:57 +00:00 |
|
Jake Poznanski
|
aedc295e3f
|
Image params to loader
|
2025-06-11 21:05:23 +00:00 |
|
Jake Poznanski
|
9a390e3d58
|
Validating that we get single pages
|
2025-06-11 18:14:36 +00:00 |
|
Jake Poznanski
|
0689676026
|
Rendering the pdfs in the dataloader
|
2025-06-11 18:11:42 +00:00 |
|
Jake Poznanski
|
352287cc16
|
Starting on dataloader
|
2025-06-11 17:54:19 +00:00 |
|
Jake Poznanski
|
0e17b50583
|
Ok, looks like we have a nice extractor script for the dataset
|
2025-06-11 17:28:00 +00:00 |
|
Jake Poznanski
|
f19f7c1271
|
Almost done extracting
|
2025-06-11 17:17:52 +00:00 |
|
Jake Poznanski
|
f0d8ff7bd3
|
First attempt at new trainer code
|
2025-06-11 16:56:16 +00:00 |
|
aman-17
|
3eda2c04c1
|
updated vllm to 0.9.1
|
2025-06-10 16:14:57 -07:00 |
|
Jake Poznanski
|
a83a0da65f
|
Cleanup of vllm perf branch with @amanr
|
2025-06-10 21:56:05 +00:00 |
|
aman-17
|
316d0af1cd
|
added dtype functionality
|
2025-06-06 16:19:40 -07:00 |
|
aman-17
|
c8a5361d1b
|
fixing packages of 22.04
|
2025-06-06 13:50:12 -07:00 |
|
aman-17
|
c5d075c63a
|
fixed apt_pkg module
|
2025-06-06 13:48:48 -07:00 |
|
aman-17
|
08fd82f323
|
made changes wrt ubuntu 22.04
|
2025-06-06 13:41:10 -07:00 |
|
aman-17
|
6507a657be
|
updated ubuntu to 22.04 for glbc 2.32
|
2025-06-06 13:29:51 -07:00 |
|
Jake Poznanski
|
25dfe0b831
|
Weird glibc error
|
2025-06-06 18:53:52 +00:00 |
|
Jake Poznanski
|
0257444720
|
Ok, cleaner retry pattern for model downloading
|
2025-06-06 18:52:01 +00:00 |
|
Vik Paruchuri
|
267f52bd79
|
Update marker cost
|
2025-06-06 13:53:50 -04:00 |
|
Jake Poznanski
|
9539eab840
|
AWs creds fix
|
2025-06-06 17:45:17 +00:00 |
|
Jake Poznanski
|
e0fda1a77d
|
Passing aws creds to benchmark so we can run custom models stored in s3
|
2025-06-05 17:40:14 +00:00 |
|