582 Commits

Author SHA1 Message Date
Jake Poznanski
a7cd7467c3 mathjax 2024-10-16 16:45:07 +00:00
Jake Poznanski
baa82a4a9a Fixing links, rendering tables 2024-10-16 16:37:08 +00:00
Jake Poznanski
19e56ec7ce dolma viewer runs much faster now 2024-10-16 16:21:25 +00:00
Jake Poznanski
96682b2ecb Refactoring 2024-10-16 16:18:27 +00:00
Jake Poznanski
2cd863ddce Dolma viewer improvements 2024-10-16 16:05:44 +00:00
Jake Poznanski
35558dbddc Make the prompt hint randomly select lines 2024-10-16 16:05:07 +00:00
Jake Poznanski
9eb252f8f6 Better tracking of completion_errors 2024-10-15 22:43:31 +00:00
Jake Poznanski
4ef14ec813 More stats 2024-10-15 22:26:31 +00:00
Jake Poznanski
4a280e55df Nicer dolma viewer 2024-10-15 21:03:28 +00:00
Jake Poznanski
42cf6a639f Dolma viewer 2024-10-15 18:37:31 +00:00
Jake Poznanski
b8cd414022 tiny fix 2024-10-15 16:54:19 +00:00
Jake Poznanski
a7fae0e659 fix 2024-10-15 16:36:54 +00:00
Jake Poznanski
4669eb7134 Adjusting workflow so I can do s2 pdfs 2024-10-15 16:22:55 +00:00
Jake Poznanski
6d61ae4aa8 Some pipeline cleanup stuff 2024-10-15 16:02:08 +00:00
Jake Poznanski
fc8fcfaeba Fixing dataloader hopefully 2024-10-15 15:13:25 +00:00
Jake Poznanski
6d53683001 More stats hopefully running faster 2024-10-14 21:37:14 +00:00
Jake Poznanski
350061906e Adding nicer output stats 2024-10-14 20:48:33 +00:00
Jake Poznanski
194af5ff52 Robustness 2024-10-14 20:31:37 +00:00
Jake Poznanski
1ed9e4c947 Runs to the end now 2024-10-14 20:28:54 +00:00
Jake Poznanski
879b974af2 More and more fixes 2024-10-14 20:06:07 +00:00
Jake Poznanski
77a850d7ef Tracking rounds of inference better 2024-10-14 18:42:50 +00:00
Jake Poznanski
af992bd603 More refactoring 2024-10-14 18:23:22 +00:00
Jake Poznanski
cd8e28e459 Pipeline working hopefully soon 2024-10-14 18:19:17 +00:00
Jake Poznanski
f2f578cca9 More pipeline code 2024-10-14 17:23:09 +00:00
Jake Poznanski
39333f2c96 New pipeline stuff 2024-10-14 17:09:11 +00:00
Jake Poznanski
4d6eaf654d Merge branch 'main' of https://github.com/allenai/pdelfin 2024-10-14 16:30:51 +00:00
Jake Poznanski
89d4ee2145 Pipeline work 2024-10-14 16:30:49 +00:00
Jake Poznanski
7b161533e2 Code to do local inference on fine tuned models for testing 2024-10-14 08:38:18 -07:00
Jake Poznanski
5a7377af30 Refactoring 2024-10-11 22:57:49 +00:00
Jake Poznanski
4fd6066600 gpt cleanup 2024-10-11 22:41:09 +00:00
Jake Poznanski
a45f86e4a4 More cleanup 2024-10-11 22:37:32 +00:00
Jake Poznanski
53fdb6108c More pipeline code 2024-10-11 21:50:09 +00:00
Jake Poznanski
10b7a58d28 fix 2024-10-11 20:22:58 +00:00
Jake Poznanski
f477a68621 dbmanager 2024-10-11 16:24:29 +00:00
Jake Poznanski
2dccc4be3b Oops removing print 2024-10-11 16:23:14 +00:00
Jake Poznanski
aea3f7f1fe Fix for anchor generation on pdfs with no text elements 2024-10-11 15:01:01 +00:00
Jake Poznanski
af03358c47 assemble 2024-10-10 22:36:09 +00:00
Jake Poznanski
312847acac Ok, finally working nicely to build the page index 2024-10-10 22:30:09 +00:00
Jake Poznanski
312ee8d953 pipeline script 2024-10-10 22:13:43 +00:00
Jake Poznanski
49b5b233c3 Working on new pipeline script 2024-10-10 22:10:26 +00:00
Jake Poznanski
a8b50ae8fa Preloading the datasets directly 2024-10-10 19:57:51 +00:00
Jake Poznanski
85f2dc6d26 Fixes 2024-10-10 18:52:42 +00:00
Jake Poznanski
2864f907e1 Dataloader fix with nicer tests 2024-10-10 16:58:45 +00:00
Jake Poznanski
b7c80cd17f Fix up some tests but I don't see why this isn't working 2024-10-10 16:58:40 +00:00
Jake Poznanski
3245990216 Faster eval script 2024-10-10 15:22:33 +00:00
Jake Poznanski
931f48c3d1 Allow eval script to support one more type of jsonls, runpipeline multiglobs, other fixes 2024-10-09 23:39:13 +00:00
Jake Poznanski
c6bdf69d8f First stab at document assembly 2024-10-09 22:19:16 +00:00
Jake Poznanski
847064f46f Taking notes, starting on document assembly 2024-10-09 22:14:28 +00:00
Jake Poznanski
8e5809da71 runpipeline 2024-10-09 20:29:59 +00:00
Jake Poznanski
a90feda42f bugfixes 2024-10-09 20:20:06 +00:00