457 Commits

Author SHA1 Message Date
Jake Poznanski
c93fc36f72 Missing import 2024-11-26 22:22:36 +00:00
Jake Poznanski
dd17185cfd More things to try 2024-11-23 21:49:33 +00:00
Jake Poznanski
46fe4acc0b Trying fixes for live lock 2024-11-23 21:41:49 +00:00
Jake Poznanski
41accfe867 Error out if you see a broken process pool, might need a better check for this 2024-11-22 22:07:43 +00:00
Jake Poznanski
a95487e44c Adding check for possible sglang livelock 2024-11-22 21:50:45 +00:00
Jake Poznanski
cff97990bf Moving to official sglang release 2024-11-22 19:37:31 +00:00
Jake Poznanski
f8dcdf625a Better catching of httpx errors and retrying them 2024-11-21 23:35:42 +00:00
Jake Poznanski
d6a00135a7 Faster init by caching pdf filter 2024-11-21 23:23:11 +00:00
Jake Poznanski
a91befc4ad Fix for fallback stuff 2024-11-21 11:08:42 -08:00
Jake Poznanski
8c858a9d15 New version 2024-11-21 10:49:31 -08:00
Jake Poznanski
66fff4f44b Merge branch 'main' of https://github.com/allenai/pdelfin 2024-11-21 18:39:22 +00:00
Jake Poznanski
212d391933 More convservative filtering 2024-11-21 18:39:21 +00:00
Jake Poznanski
b8b786e003 Applying pdf filter 2024-11-21 10:20:58 -08:00
Jake Poznanski
cb800d6e2c Merge branch 'main' of https://github.com/allenai/pdelfin into main 2024-11-21 08:58:30 -08:00
Jake Poznanski
7dd20460a3 New version 2024-11-21 08:58:28 -08:00
Jake Poznanski
219cc7eca8 Merge branch 'main' of https://github.com/allenai/pdelfin 2024-11-21 16:56:20 +00:00
Jake Poznanski
98e40143dd Adding mass filtering script 2024-11-21 16:56:19 +00:00
Jake Poznanski
af8ce518ac Merge branch 'main' of https://github.com/allenai/pdelfin into main 2024-11-21 08:45:19 -08:00
Jake Poznanski
9112d81bd1 No keep alive connection to try to resolve sglang livelock 2024-11-21 08:45:17 -08:00
Jake Poznanski
2443c22fde Projected output tokens 2024-11-20 23:57:10 +00:00
Jake Poznanski
09319a64ea new version 2024-11-20 22:58:06 +00:00
Jake Poznanski
53a510479b Merge branch 'main' of https://github.com/allenai/pdelfin into main 2024-11-20 14:45:24 -08:00
Jake Poznanski
67d11ec0e6 TODOs and client fix 2024-11-20 14:45:12 -08:00
Jake Poznanski
092480573b Baseline repeat detect 2024-11-20 19:58:20 +00:00
Jake Poznanski
c9e1a4c540 More tests 2024-11-20 19:37:00 +00:00
Jake Poznanski
3153aea260 Merge branch 'main' of https://github.com/allenai/pdelfin into main 2024-11-20 10:42:39 -08:00
Jake Poznanski
9b8d58b59e Better stats and metadata 2024-11-20 10:42:26 -08:00
Jake Poznanski
878a21b48d
Update README.md 2024-11-20 08:55:57 -08:00
Jake Poznanski
5704bb89ad
Update README.md 2024-11-20 08:54:30 -08:00
Jake Poznanski
273a8b0d0a Logging fallback pages 2024-11-19 15:11:02 -08:00
Jake Poznanski
b0acfa870e Adding support for fallback pages 2024-11-19 14:59:20 -08:00
Jake Poznanski
204a4a8e5b Better stats 2024-11-19 13:41:32 -08:00
Jake Poznanski
3ef4609bdd Fixing args 2024-11-19 11:48:45 -08:00
Jake Poznanski
27d23525b7 Claude recommends httpx instead of aiohttp, seeing if that will help with straggler timeouts 2024-11-19 10:41:58 -08:00
Jake Poznanski
4469f4b2ce Version patch 2024-11-18 19:55:26 -08:00
Jake Poznanski
9e2e09bd06 More fixes 2024-11-18 15:04:50 -08:00
Jake Poznanski
8793fc7d99 Adding more retries, and it was able to process more complicated books 2024-11-18 14:25:32 -08:00
Jake Poznanski
2f55a3ddb7 fix 2024-11-18 13:58:25 -08:00
Jake Poznanski
d4d47369cb more gcs 2024-11-18 13:20:28 -08:00
Jake Poznanski
e48d4bef00 Fix 2024-11-18 13:16:19 -08:00
Jake Poznanski
8c3b5753c9 Gcs support better 2024-11-18 13:07:27 -08:00
Jake Poznanski
9381bf862a docs 2024-11-18 12:44:34 -08:00
Jake Poznanski
f287f2451c Fixing a few stats things 2024-11-18 11:50:22 -08:00
Jake Poznanski
e499413089 Better work queue 2024-11-18 11:04:51 -08:00
Jake Poznanski
04429b2862 Basic work queue from claude 2024-11-18 10:07:03 -08:00
Jake Poznanski
995b1d15fc Fixes, mocking out queue into separate file 2024-11-18 09:55:45 -08:00
Jake Poznanski
fcabb8e55a Handling more error cases 2024-11-18 09:12:04 -08:00
Jake Poznanski
96984fcd77 Fix a reliability issue 2024-11-18 09:03:24 -08:00
Jake Poznanski
0af29f1f44 Adding page rotation 2024-11-18 08:29:32 -08:00
Jake Poznanski
e2303f28af Running on l40s, fixing queue 2024-11-18 08:25:36 -08:00