127 Commits

Author SHA1 Message Date
Jake Poznanski
386374bd72 More prints 2024-11-25 16:08:24 -08:00
Jake Poznanski
04d6123037 Doing some experiments 2024-11-25 15:36:04 -08:00
Jake Poznanski
51614efc83 More log probs investigation 2024-11-25 11:24:21 -08:00
Jake Poznanski
28d52602e9 More test code 2024-11-25 11:00:03 -08:00
Jake Poznanski
606e81bfea Not happy here with this test 2024-11-25 10:32:18 -08:00
Jake Poznanski
d7838372e8 Full test 2024-11-25 10:25:55 -08:00
Jake Poznanski
2e4f7d7827 Working on HF test for comparison 2024-11-25 10:12:29 -08:00
Jake Poznanski
5e3080db28 Sglang based unit test 2024-11-25 09:48:05 -08:00
Jake Poznanski
60f24ad2d6 tests 2024-11-25 09:39:55 -08:00
Jake Poznanski
5289092076 Startingon sglang test 2024-11-25 09:34:59 -08:00
Jake Poznanski
ba8eba245b Unit tests fixes 2024-11-25 09:13:13 -08:00
Jake Poznanski
c9e1a4c540 More tests 2024-11-20 19:37:00 +00:00
Jake Poznanski
8793fc7d99 Adding more retries, and it was able to process more complicated books 2024-11-18 14:25:32 -08:00
Jake Poznanski
e499413089 Better work queue 2024-11-18 11:04:51 -08:00
Jake Poznanski
04429b2862 Basic work queue from claude 2024-11-18 10:07:03 -08:00
Jake Poznanski
fcabb8e55a Handling more error cases 2024-11-18 09:12:04 -08:00
Jake Poznanski
96984fcd77 Fix a reliability issue 2024-11-18 09:03:24 -08:00
Jake Poznanski
6a4a55f9e0 Hopefully working molmo HF trainer config 2024-10-30 14:00:27 -07:00
Jake Poznanski
bede854cd5 Startng to write molmo formatters 2024-10-30 13:24:11 -07:00
Jake Poznanski
85e0e2a61b Fixing issues with pdf parsing 2024-10-30 16:26:02 +00:00
Jake Poznanski
08d51b7183 Adding some rotation retry contrl 2024-10-28 20:16:06 +00:00
Jake Poznanski
ffe470bf0e Fix 2024-10-23 22:55:50 +00:00
Jake Poznanski
180dde03c5 dataprep sampling tests 2024-10-23 22:53:05 +00:00
Jake Poznanski
999f64dd46 Adding empty anchor support 2024-10-23 22:17:20 +00:00
Jake Poznanski
a1a4798ce7 Some crazy idea I had to simplify futures and memory limits 2024-10-23 21:51:37 +00:00
Jake Poznanski
302eee3da5 Yay matches between birr and hf 2024-10-21 16:58:30 +00:00
Jake Poznanski
9d35d3ca8f Birr tokenization test 2024-10-18 23:02:37 +00:00
Jake Poznanski
7dbcbc154b Birr tests that don't do anything but help me understand the universe 2024-10-18 22:39:17 +00:00
Jake Poznanski
dd4f9670b5 Filter refactor 2024-10-17 22:36:38 +00:00
Jake Poznanski
7d4cff53b5 Nice test for picking proper page in birrpipelie 2024-10-17 20:26:02 +00:00
Jake Poznanski
2826bcad18 Yay all unit tests pass cleanly now too 2024-10-17 17:05:55 +00:00
Jake Poznanski
124aaf5fe0 Hmm, cant repro failing anchor case 2024-10-17 17:00:02 +00:00
Jake Poznanski
202d81cece Merge branch 'main' of https://github.com/allenai/pdelfin into main 2024-10-16 11:38:33 -07:00
Jake Poznanski
e2552b2f28 Adding test case 2024-10-16 11:38:31 -07:00
Jake Poznanski
3c1b7de293 Refactoring of train dataloaders 2024-10-16 18:26:25 +00:00
Jake Poznanski
23d129fd2c Organizing around a new style of dataloader 2024-10-16 18:06:27 +00:00
Jake Poznanski
a2546e0b04 more stuff 2024-10-16 17:06:03 +00:00
Jake Poznanski
96682b2ecb Refactoring 2024-10-16 16:18:27 +00:00
Jake Poznanski
2cd863ddce Dolma viewer improvements 2024-10-16 16:05:44 +00:00
Jake Poznanski
6d53683001 More stats hopefully running faster 2024-10-14 21:37:14 +00:00
Jake Poznanski
7b161533e2 Code to do local inference on fine tuned models for testing 2024-10-14 08:38:18 -07:00
Jake Poznanski
2864f907e1 Dataloader fix with nicer tests 2024-10-10 16:58:45 +00:00
Jake Poznanski
b7c80cd17f Fix up some tests but I don't see why this isn't working 2024-10-10 16:58:40 +00:00
Jake Poznanski
a90feda42f bugfixes 2024-10-09 20:20:06 +00:00
Jake Poznanski
4bf6e7a430 Refactoring 2024-10-09 18:11:18 +00:00
Jake Poznanski
dc6440d068 Cleaning up anchor text to deal with abnormally long lines 2024-10-09 16:29:20 +00:00
Jake Poznanski
230c8a9f9a Trying new run that will rewrite the prompts as it goes 2024-10-08 22:10:18 +00:00
Jake Poznanski
97291b3f6a Anchor is fixed to sample text elements better 2024-10-08 21:51:43 +00:00
Jake Poznanski
c8a4d14c57 Adding image merging to pdf report/hint/anchor 2024-10-08 21:23:21 +00:00
Jake Poznanski
ebd40f9084 Hopefully fixing dataloader for now 2024-10-07 12:59:27 -07:00