582 Commits

Author SHA1 Message Date
Jake Poznanski
84c53c27d2 Merge branch 'main' of https://github.com/allenai/pdelfin 2024-12-04 17:56:47 +00:00
Jake Poznanski
e9c3c21731 Skipping files which are not found 2024-12-04 17:56:45 +00:00
Jake Poznanski
7bec33a67a Merge branch 'main' of https://github.com/allenai/pdelfin into main 2024-12-03 15:32:55 -08:00
Jake Poznanski
917cdeccba Some more tests 2024-12-03 15:32:53 -08:00
Jake Poznanski
3e33ce1cde Ignores 2024-12-03 18:50:31 +00:00
Jake Poznanski
37cdb9e146 Merge branch 'main' of https://github.com/allenai/pdelfin 2024-12-03 18:49:41 +00:00
Jake Poznanski
1eda300e01 Dolma viewer niceties 2024-12-03 18:49:16 +00:00
Jake Poznanski
535181d8e0 Moving to manual HTTP Post, have had succss with 10k page files now 2024-12-03 10:48:52 -08:00
Jake Poznanski
fe04db828e Better error handling 2024-12-02 23:56:45 +00:00
Jake Poznanski
35502bc6f2 Limit the number of retries on the server process 2024-12-02 23:46:46 +00:00
Jake Poznanski
b3ca86a182 More robust to errors when reading logs which had caused freezes 2024-12-02 13:59:42 -08:00
Jake Poznanski
d4f3cfff4d More reliable weka 2024-11-27 19:11:20 +00:00
Jake Poznanski
687210572d Merge branch 'main' of https://github.com/allenai/pdelfin 2024-11-26 22:22:38 +00:00
Jake Poznanski
c93fc36f72 Missing import 2024-11-26 22:22:36 +00:00
Jake Poznanski
9b9d04c8e9 aaa 2024-11-26 08:38:25 -08:00
Jake Poznanski
386374bd72 More prints 2024-11-25 16:08:24 -08:00
Jake Poznanski
04d6123037 Doing some experiments 2024-11-25 15:36:04 -08:00
Jake Poznanski
51614efc83 More log probs investigation 2024-11-25 11:24:21 -08:00
Jake Poznanski
28d52602e9 More test code 2024-11-25 11:00:03 -08:00
Jake Poznanski
606e81bfea Not happy here with this test 2024-11-25 10:32:18 -08:00
Jake Poznanski
d7838372e8 Full test 2024-11-25 10:25:55 -08:00
Jake Poznanski
2e4f7d7827 Working on HF test for comparison 2024-11-25 10:12:29 -08:00
Jake Poznanski
5e3080db28 Sglang based unit test 2024-11-25 09:48:05 -08:00
Jake Poznanski
60f24ad2d6 tests 2024-11-25 09:39:55 -08:00
Jake Poznanski
5289092076 Startingon sglang test 2024-11-25 09:34:59 -08:00
Jake Poznanski
ba8eba245b Unit tests fixes 2024-11-25 09:13:13 -08:00
Jake Poznanski
dd17185cfd More things to try 2024-11-23 21:49:33 +00:00
Jake Poznanski
46fe4acc0b Trying fixes for live lock 2024-11-23 21:41:49 +00:00
Jake Poznanski
41accfe867 Error out if you see a broken process pool, might need a better check for this 2024-11-22 22:07:43 +00:00
Jake Poznanski
a95487e44c Adding check for possible sglang livelock 2024-11-22 21:50:45 +00:00
Jake Poznanski
cff97990bf Moving to official sglang release 2024-11-22 19:37:31 +00:00
Jake Poznanski
f8dcdf625a Better catching of httpx errors and retrying them 2024-11-21 23:35:42 +00:00
Jake Poznanski
d6a00135a7 Faster init by caching pdf filter 2024-11-21 23:23:11 +00:00
Jake Poznanski
a91befc4ad Fix for fallback stuff 2024-11-21 11:08:42 -08:00
Jake Poznanski
8c858a9d15 New version 2024-11-21 10:49:31 -08:00
Jake Poznanski
66fff4f44b Merge branch 'main' of https://github.com/allenai/pdelfin 2024-11-21 18:39:22 +00:00
Jake Poznanski
212d391933 More convservative filtering 2024-11-21 18:39:21 +00:00
Jake Poznanski
b8b786e003 Applying pdf filter 2024-11-21 10:20:58 -08:00
Jake Poznanski
cb800d6e2c Merge branch 'main' of https://github.com/allenai/pdelfin into main 2024-11-21 08:58:30 -08:00
Jake Poznanski
7dd20460a3 New version 2024-11-21 08:58:28 -08:00
Jake Poznanski
219cc7eca8 Merge branch 'main' of https://github.com/allenai/pdelfin 2024-11-21 16:56:20 +00:00
Jake Poznanski
98e40143dd Adding mass filtering script 2024-11-21 16:56:19 +00:00
Jake Poznanski
af8ce518ac Merge branch 'main' of https://github.com/allenai/pdelfin into main 2024-11-21 08:45:19 -08:00
Jake Poznanski
9112d81bd1 No keep alive connection to try to resolve sglang livelock 2024-11-21 08:45:17 -08:00
Jake Poznanski
2443c22fde Projected output tokens 2024-11-20 23:57:10 +00:00
Jake Poznanski
09319a64ea new version 2024-11-20 22:58:06 +00:00
Jake Poznanski
53a510479b Merge branch 'main' of https://github.com/allenai/pdelfin into main 2024-11-20 14:45:24 -08:00
Jake Poznanski
67d11ec0e6 TODOs and client fix 2024-11-20 14:45:12 -08:00
Jake Poznanski
092480573b Baseline repeat detect 2024-11-20 19:58:20 +00:00
Jake Poznanski
c9e1a4c540 More tests 2024-11-20 19:37:00 +00:00