225 Commits

Author SHA1 Message Date
Jake Poznanski
12f8a90f1b Copying preprocessed files to local ssd in trainer script 2025-08-04 22:18:38 +00:00
Jake Poznanski
7c098955a9 Trying fix for transformers benchmark 2025-08-04 19:50:05 +00:00
Jake Poznanski
df52cb0e0e Small fixes for transformers test runner 2025-07-25 03:18:24 +00:00
Jake Poznanski
cf1912dec4 Some transformer bench ideas 2025-07-24 21:20:15 +00:00
Jake Poznanski
16145a4b32 Need accelerate 2025-07-16 18:51:37 +00:00
Jake Poznanski
31c834dcdd Constants 2025-07-16 02:15:17 +00:00
Jake Poznanski
5ea4e8a6e2 Compare vllm script 2025-07-15 22:55:49 +00:00
Jake Poznanski
24608956a0 Working on comparing to vllm 2025-07-15 22:21:54 +00:00
Jake Poznanski
e6c98236b6 Adding more pipeline retry stats, compress code fixed 2025-07-15 21:41:10 +00:00
Jake Poznanski
4dbbf91e1c Compression script 2025-07-15 21:26:15 +00:00
Jake Poznanski
1092213c5f Merge branch 'jakep/new_traininer_nojson_newprompt' into jakep/new_trainer 2025-07-15 17:44:55 +00:00
Jake Poznanski
43ae28dde4 Prepare checkpoint works for older models too 2025-07-14 21:30:32 +00:00
Jake Poznanski
f014c2aaf9 Need to reserve all 8 gpus for reliable performance benchmark, even if you only use 1 2025-07-14 21:02:14 +00:00
Jake Poznanski
65d0edcaae Adding guided decoding option 2025-07-10 15:13:26 +00:00
Jake Poznanski
a7e2f719bf Start a preemptible one at least once 2025-07-02 19:26:30 +00:00
Jake Poznanski
79a7818517 New trainer launch script for beaker 2025-07-01 01:43:38 +00:00
Jake Poznanski
dcf026a63c Better script 2025-07-01 01:40:55 +00:00
Jake Poznanski
9f0f912101 Ugh 2025-06-30 23:37:35 +00:00
Jake Poznanski
1d007d1bf2 Perhaps fixing default config 2025-06-30 22:58:21 +00:00
Jake Poznanski
e7020c7f50 More configs 2025-06-30 22:49:42 +00:00
Jake Poznanski
5c2d69a3d7 Some cleanup stuff 2025-06-30 21:24:35 +00:00
Jake Poznanski
656dbef833 Frontier configs 2025-06-30 17:43:30 +00:00
Jake Poznanski
e2f2d36e4f More typos 2025-06-30 17:41:19 +00:00
Jake Poznanski
55a737ca6b script 2025-06-30 17:32:01 +00:00
Jake Poznanski
ba49fd53d9 frontier train script let's see what happens 2025-06-30 17:30:17 +00:00
Jake Poznanski
b96454b786 Merge branch 'main' into jakep/new_trainer 2025-06-27 16:21:58 +00:00
Jake Poznanski
1451dd1395 weka 2025-06-27 02:57:26 +00:00
Jake Poznanski
dee3730231 Gantry stuff 2025-06-26 18:34:53 +00:00
Jake Poznanski
0d7836b111 Basic atttempt to run trainer script 2025-06-25 23:22:59 +00:00
Jake Poznanski
d7e5037192 New trainer launch script cleanups 2025-06-25 23:05:32 +00:00
Jake Poznanski
ec5c5b6444 Updating pareto plots 2025-06-17 21:41:23 +00:00
Jake Poznanski
6c51829ae6 Some helper scripts 2025-06-17 21:21:50 +00:00
Jake Poznanski
1295e171bb Merge branch 'main' of https://github.com/allenai/olmocr 2025-06-12 22:35:09 +00:00
Jake Poznanski
37090e2801 Go back to workers 1 in marker test script 2025-06-12 22:35:08 +00:00
Jake Poznanski
af02f15f24
Merge pull request #236 from VikParuchuri/main
Fix marker benchmarks
2025-06-12 15:24:17 -07:00
Jake Poznanski
3da6e2d587 Pareto plot update, keep cost the same for now 2025-06-12 22:23:41 +00:00
Jake Poznanski
fcd8bbec92 Install aws cli 2025-06-12 21:38:28 +00:00
Jake Poznanski
fc06797bec aws cli 2025-06-12 21:29:39 +00:00
Jake Poznanski
59e0a1ccb0 Marker wants newer torchvision 2025-06-12 21:23:53 +00:00
Jake Poznanski
0f3b45c1a3 Add time 2025-06-12 21:19:17 +00:00
Jake Poznanski
4bfcfce767 Actually install the right thing 2025-06-12 21:18:58 +00:00
Jake Poznanski
f8dfd85765 Script 2025-06-12 21:13:31 +00:00
Jake Poznanski
044874a634 Adding marker benchmark 2025-06-12 21:12:58 +00:00
Jake Poznanski
9787d007b9 Pulling in bigger benchmark script from vllm branch to main 2025-06-12 21:02:46 +00:00
Jake Poznanski
43c94fea58 Bencharmk update 2025-06-12 20:47:58 +00:00
Jake Poznanski
b1e064f8a6 Run benchmark script will also start a job to convert 10k docs from olmocr-mix to check performance 2025-06-12 20:27:50 +00:00
Jake Poznanski
0689676026 Rendering the pdfs in the dataloader 2025-06-11 18:11:42 +00:00
Jake Poznanski
f0d8ff7bd3 First attempt at new trainer code 2025-06-11 16:56:16 +00:00
Jake Poznanski
25dfe0b831 Weird glibc error 2025-06-06 18:53:52 +00:00
Vik Paruchuri
267f52bd79 Update marker cost 2025-06-06 13:53:50 -04:00