693 Commits

Author SHA1 Message Date
Jake Poznanski
51cfdbd64f Better converter 2025-02-13 22:30:20 +00:00
Jake Poznanski
e369569f99
Update README.md 2025-02-13 13:46:02 -08:00
Jake Poznanski
91eef279b3 Adding some gnarly 1 pager pdfs from kyle 2025-02-11 18:45:42 +00:00
Jake Poznanski
87cb9573d8 First pass at dataset builder script 2025-02-11 18:38:41 +00:00
Jake Poznanski
6ed6f85c42 Generating parquets for hugging face 2025-02-10 23:12:38 +00:00
Jake Poznanski
84c0c71393 Merge branch 'main' of https://github.com/allenai/olmocr 2025-02-10 22:00:42 +00:00
Jake Poznanski
7d67a59c31 Remove unused 2025-02-10 22:00:40 +00:00
Jake Poznanski
6471f28ec8 Random git ignores, remove unused code 2025-02-10 22:00:35 +00:00
Jake Poznanski
f04d1207a5 Merge branch 'main' of https://github.com/allenai/olmocr into main 2025-02-10 12:40:29 -08:00
Jake Poznanski
e73ff9d7a1 Updating to new model name on HF 2025-02-10 12:39:49 -08:00
Jake Poznanski
e627842b77
Merge pull request #28 from allenai/amanr/code_documentation
Resolved Git checks and updated readme
2025-02-10 11:53:54 -08:00
aman-17
f57c6f3f7b restored modeling_molmo.py file 2025-02-10 11:07:35 -08:00
aman-17
4bff92053b updated changelog 2025-02-07 16:34:53 -08:00
aman-17
b6e5dab306 fixed lint check 2025-02-07 16:29:27 -08:00
aman-17
a036133fdd resolved all the mypy, black and isort issues and updated readme 2025-02-07 16:05:00 -08:00
Jake Poznanski
9bf3d35cdb Comment fix 2025-01-30 16:02:08 -08:00
Jake Poznanski
2ab7cb280c Removing pymupdf 2025-01-30 15:51:54 -08:00
Jake Poznanski
ddeea92591 More dev dependecies 2025-01-30 15:38:29 -08:00
Jake Poznanski
72f4b9a590 Project setup 2025-01-30 15:33:04 -08:00
Jake Poznanski
cdd830235f Shortened some sample docs 2025-01-30 15:28:31 -08:00
Jake Poznanski
10094ffc19 Even newer mypy crashes still 2025-01-30 14:32:08 -08:00
Jake Poznanski
c74d47a553 Pipeline fixes 2025-01-30 22:30:39 +00:00
Jake Poznanski
04844b3f87 More beaker and docker fixes 2025-01-30 22:14:57 +00:00
Jake Poznanski
9df86da271 Beaker fixes 2025-01-30 21:44:22 +00:00
Jake Poznanski
cf6673cecf Pipeline fixes 2025-01-30 13:42:42 -08:00
Jake Poznanski
7fbbb572ae Remove mypy for now 2025-01-30 13:37:01 -08:00
Jake Poznanski
d36e556f19 Hopefully fixes build 2025-01-30 13:11:37 -08:00
Jake Poznanski
c69e0d6762 More cleanup, removing dead adv anchor code 2025-01-30 12:58:11 -08:00
Jake Poznanski
d4d711d12a Nicer glob handing for pipeline.py 2025-01-30 12:48:10 -08:00
Jake Poznanski
84477b50f4 More formatting 2025-01-30 10:54:21 -08:00
Jake Poznanski
e3d04ee79f Merge branch 'main' of https://github.com/allenai/olmocr into main 2025-01-30 10:53:40 -08:00
Jake Poznanski
c37e545d25 running isort again 2025-01-30 10:53:35 -08:00
Jake Poznanski
358a24f6cb
Update README.md 2025-01-30 10:33:54 -08:00
Jake Poznanski
c58e13392b
Update README.md 2025-01-30 10:28:57 -08:00
Jake Poznanski
2c2953329e Fixing most ruff errors 2025-01-29 15:57:26 -08:00
Jake Poznanski
56903774b7 Ruff 2025-01-29 15:47:57 -08:00
Jake Poznanski
fb402297ce Isort and black update 2025-01-29 15:42:34 -08:00
Jake Poznanski
cdb10a951b Python 3.11 2025-01-29 15:33:11 -08:00
Jake Poznanski
dcaca8aa90 Black formatting 2025-01-29 15:30:39 -08:00
Jake Poznanski
4a1762d455 isort 2025-01-29 15:25:10 -08:00
Jake Poznanski
0628d3161f Some unit test cleanup 2025-01-29 15:15:10 -08:00
Jake Poznanski
7d2403da52 More infos 2025-01-29 14:25:15 -08:00
Jake Poznanski
8dd006d806 Merge branch 'main' of https://github.com/allenai/olmocr into main 2025-01-29 14:12:41 -08:00
Jake Poznanski
04615d7f0a More logging on sglang server 2025-01-29 14:12:39 -08:00
Jake Poznanski
d9f5b7245f Merge branch 'main' of https://github.com/allenai/olmocr 2025-01-29 22:03:39 +00:00
Jake Poznanski
962126e987 Typo 2025-01-29 22:03:37 +00:00
Jake Poznanski
c7e56e7bff
Update README.md 2025-01-29 14:01:18 -08:00
Jake Poznanski
6369b1f10c Merge branch 'main' of https://github.com/allenai/olmocr 2025-01-29 21:49:19 +00:00
Jake Poznanski
17a5dfe0d0 Add gpu message 2025-01-29 21:48:56 +00:00
Jake Poznanski
7e4fb68869
Update README.md 2025-01-29 13:37:13 -08:00