26 Commits

Author SHA1 Message Date
Jake Poznanski
062abff25c Adding some skip logic 2024-10-27 21:17:48 +00:00
Jake Poznanski
8e6d0c65d6 swtichin to orjson, some better json error handling 2024-10-25 22:10:54 +00:00
Jake Poznanski
48a3affec3 Reindexing 2024-10-25 20:32:51 +00:00
Jake Poznanski
f8c5aac5a0 Some cleanup 2024-10-23 21:51:54 +00:00
Jake Poznanski
38dc5a2a0f Refactored to have a more efficient batchwriter, and also not allow too many running futures 2024-10-23 16:28:46 +00:00
Jake Poznanski
0a5c5068b4 index 2024-10-22 16:03:06 +00:00
Jake Poznanski
7c7867626f Fix pipeline bug with indexing 2024-10-22 15:47:11 +00:00
Jake Poznanski
77f0b9fa84 help text 2024-10-18 22:39:25 +00:00
Jake Poznanski
492a3f6bef Adding parameters for taget image and anchor text sizes 2024-10-18 21:47:30 +00:00
Jake Poznanski
1c8602c0ff Removing rotation invalid ones to see what happens 2024-10-17 22:41:44 +00:00
Jake Poznanski
a4d76206ff Choosing proper page 2024-10-17 20:18:06 +00:00
Jake Poznanski
23d129fd2c Organizing around a new style of dataloader 2024-10-16 18:06:27 +00:00
Jake Poznanski
96682b2ecb Refactoring 2024-10-16 16:18:27 +00:00
Jake Poznanski
9eb252f8f6 Better tracking of completion_errors 2024-10-15 22:43:31 +00:00
Jake Poznanski
4ef14ec813 More stats 2024-10-15 22:26:31 +00:00
Jake Poznanski
b8cd414022 tiny fix 2024-10-15 16:54:19 +00:00
Jake Poznanski
a7fae0e659 fix 2024-10-15 16:36:54 +00:00
Jake Poznanski
4669eb7134 Adjusting workflow so I can do s2 pdfs 2024-10-15 16:22:55 +00:00
Jake Poznanski
6d61ae4aa8 Some pipeline cleanup stuff 2024-10-15 16:02:08 +00:00
Jake Poznanski
6d53683001 More stats hopefully running faster 2024-10-14 21:37:14 +00:00
Jake Poznanski
350061906e Adding nicer output stats 2024-10-14 20:48:33 +00:00
Jake Poznanski
194af5ff52 Robustness 2024-10-14 20:31:37 +00:00
Jake Poznanski
1ed9e4c947 Runs to the end now 2024-10-14 20:28:54 +00:00
Jake Poznanski
879b974af2 More and more fixes 2024-10-14 20:06:07 +00:00
Jake Poznanski
77a850d7ef Tracking rounds of inference better 2024-10-14 18:42:50 +00:00
Jake Poznanski
af992bd603 More refactoring 2024-10-14 18:23:22 +00:00