olmocr

mirror of https://github.com/allenai/olmocr.git synced 2025-06-27 04:00:02 +00:00

Author	SHA1	Message	Date
aman-17	0130a970c2	fixed style	2025-02-25 08:57:02 -08:00
Jake Poznanski	25ec87b66d	CI	2025-02-14 20:46:55 +00:00
Jake Poznanski	c05e01532c	Hopefully CI runs now	2025-02-14 20:42:19 +00:00
Jake Poznanski	dcaca8aa90	Black formatting	2025-01-29 15:30:39 -08:00
Jake Poznanski	4a1762d455	isort	2025-01-29 15:25:10 -08:00
Jake Poznanski	b2894d0280	Massive refactor from pdelfin to olmocr	2025-01-27 18:30:41 +00:00
Jake Poznanski	6a4a55f9e0	Hopefully working molmo HF trainer config	2024-10-30 14:00:27 -07:00
Jake Poznanski	bede854cd5	Startng to write molmo formatters	2024-10-30 13:24:11 -07:00
Jake Poznanski	ffe470bf0e	Fix	2024-10-23 22:55:50 +00:00
Jake Poznanski	180dde03c5	dataprep sampling tests	2024-10-23 22:53:05 +00:00
Jake Poznanski	2826bcad18	Yay all unit tests pass cleanly now too	2024-10-17 17:05:55 +00:00
Jake Poznanski	124aaf5fe0	Hmm, cant repro failing anchor case	2024-10-17 17:00:02 +00:00
Jake Poznanski	2864f907e1	Dataloader fix with nicer tests	2024-10-10 16:58:45 +00:00
Jake Poznanski	b7c80cd17f	Fix up some tests but I don't see why this isn't working	2024-10-10 16:58:40 +00:00
Jake Poznanski	e42cecf96c	Adding anchor code based off of pypdf that visits each text block, hopefully so we can make it output good bboxes	2024-10-01 22:10:58 +00:00
Jake Poznanski	decfd7fbc1	Fixing the refiner input prompt to something simpler that doesn't depend on the training data. Fixing beaker job workspace and bumping priority to high.	2024-09-27 22:54:07 +00:00
Jake Poznanski	4eddb1b45f	Okay, reasonably happy with the dataprep pipeline	2024-09-20 13:04:47 -07:00
Jake Poznanski	a47afe5c8d	Adding test to make sure the traning and inference time tokenization stays identical, currenlty failing	2024-09-20 12:01:05 -07:00

18 Commits