114 Commits

Author SHA1 Message Date
Jake Poznanski
f8fd234093 Idea to improve retry performance 2025-05-28 18:27:40 +00:00
Jake Poznanski
63aee2c1e5 Code cleanup, version bump, remove unused permutation test 2025-05-16 21:25:32 +00:00
Jake Poznanski
1854ae1269 A bit more work on tagging 2025-05-09 19:31:07 +00:00
Jake Poznanski
03db04cb7e Fixing handling of new lines in some test cases 2025-05-08 17:21:06 +00:00
Jake Poznanski
8f46b6e966 Running more tests in CI 2025-04-17 14:26:06 -07:00
Jake Poznanski
1d0c560455 Upping version to fix issue with work queue and delimited paths 2025-04-15 18:50:13 +00:00
Jake Poznanski
79e2677319 Hmm, these should be passing! 2025-03-14 02:52:13 +00:00
Jake Poznanski
f5d92bdb14 Trying to get new CI to work 2025-03-14 02:43:55 +00:00
Chris Wilhelm
c585415797 for now, only process one pdf in the ci script 2025-03-13 15:48:47 -07:00
Chris Wilhelm
9b958e65f1 moves what happens where around a bit and updates readme 2025-03-13 15:31:55 -07:00
Chris Wilhelm
29b9054749 basic docker image and test 2025-03-13 15:31:55 -07:00
aman-17
0130a970c2 fixed style 2025-02-25 08:57:02 -08:00
Jake Poznanski
58bdfa512b CI 2025-02-14 20:51:04 +00:00
Jake Poznanski
25ec87b66d CI 2025-02-14 20:46:55 +00:00
Jake Poznanski
c05e01532c Hopefully CI runs now 2025-02-14 20:42:19 +00:00
Jake Poznanski
91eef279b3 Adding some gnarly 1 pager pdfs from kyle 2025-02-11 18:45:42 +00:00
aman-17
a036133fdd resolved all the mypy, black and isort issues and updated readme 2025-02-07 16:05:00 -08:00
Jake Poznanski
9bf3d35cdb Comment fix 2025-01-30 16:02:08 -08:00
Jake Poznanski
2ab7cb280c Removing pymupdf 2025-01-30 15:51:54 -08:00
Jake Poznanski
72f4b9a590 Project setup 2025-01-30 15:33:04 -08:00
Jake Poznanski
cdd830235f Shortened some sample docs 2025-01-30 15:28:31 -08:00
Jake Poznanski
10094ffc19 Even newer mypy crashes still 2025-01-30 14:32:08 -08:00
Jake Poznanski
fb402297ce Isort and black update 2025-01-29 15:42:34 -08:00
Jake Poznanski
dcaca8aa90 Black formatting 2025-01-29 15:30:39 -08:00
Jake Poznanski
4a1762d455 isort 2025-01-29 15:25:10 -08:00
Jake Poznanski
0628d3161f Some unit test cleanup 2025-01-29 15:15:10 -08:00
Jake Poznanski
b28aad61bb More test docs 2025-01-27 21:11:23 +00:00
Jake Poznanski
96ae2dd49b Refactoring 2025-01-27 20:45:28 +00:00
Jake Poznanski
c6062677aa Cleaning up some unused code 2025-01-27 18:48:15 +00:00
Jake Poznanski
b2894d0280 Massive refactor from pdelfin to olmocr 2025-01-27 18:30:41 +00:00
Jake Poznanski
01469af463 Doing some debugging 2025-01-23 10:58:43 -08:00
Jake Poznanski
72d2fa2fd4 Reviewing molmo training 2025-01-22 15:23:08 -08:00
Jake Poznanski
0d1fc08081 Small fixes 2025-01-10 19:38:42 +00:00
Jake Poznanski
5692a76350 Ok, direct easy test for diffs now 2024-12-04 13:27:51 -08:00
Jake Poznanski
48f3ab82bd Working on some random tests 2024-12-04 13:20:10 -08:00
Jake Poznanski
917cdeccba Some more tests 2024-12-03 15:32:53 -08:00
Jake Poznanski
9b9d04c8e9 aaa 2024-11-26 08:38:25 -08:00
Jake Poznanski
386374bd72 More prints 2024-11-25 16:08:24 -08:00
Jake Poznanski
04d6123037 Doing some experiments 2024-11-25 15:36:04 -08:00
Jake Poznanski
51614efc83 More log probs investigation 2024-11-25 11:24:21 -08:00
Jake Poznanski
28d52602e9 More test code 2024-11-25 11:00:03 -08:00
Jake Poznanski
606e81bfea Not happy here with this test 2024-11-25 10:32:18 -08:00
Jake Poznanski
d7838372e8 Full test 2024-11-25 10:25:55 -08:00
Jake Poznanski
2e4f7d7827 Working on HF test for comparison 2024-11-25 10:12:29 -08:00
Jake Poznanski
5e3080db28 Sglang based unit test 2024-11-25 09:48:05 -08:00
Jake Poznanski
60f24ad2d6 tests 2024-11-25 09:39:55 -08:00
Jake Poznanski
5289092076 Startingon sglang test 2024-11-25 09:34:59 -08:00
Jake Poznanski
ba8eba245b Unit tests fixes 2024-11-25 09:13:13 -08:00
Jake Poznanski
c9e1a4c540 More tests 2024-11-20 19:37:00 +00:00
Jake Poznanski
8793fc7d99 Adding more retries, and it was able to process more complicated books 2024-11-18 14:25:32 -08:00