307 Commits

Author SHA1 Message Date
Jake Poznanski
d45b34fdd5 Trust remote code 2024-10-30 21:22:39 +00:00
Jake Poznanski
cda0ad7984 Config typo 2024-10-30 21:18:48 +00:00
Jake Poznanski
cf3b377bb9 train script 2024-10-30 14:05:02 -07:00
Jake Poznanski
8f001bf74c Config updates 2024-10-30 14:02:57 -07:00
Jake Poznanski
6a4a55f9e0 Hopefully working molmo HF trainer config 2024-10-30 14:00:27 -07:00
Jake Poznanski
bede854cd5 Startng to write molmo formatters 2024-10-30 13:24:11 -07:00
Jake Poznanski
e65747e591 Some better logging 2024-10-30 11:22:52 -07:00
Jake Poznanski
a0e0917102 Merge branch 'main' of https://github.com/allenai/pdelfin into main 2024-10-30 10:42:56 -07:00
Jake Poznanski
43aa4f2508 Proper selection of LORA weights 2024-10-30 10:42:53 -07:00
Jake Poznanski
c652c7e396 Merge branch 'main' of https://github.com/allenai/pdelfin 2024-10-30 16:26:03 +00:00
Jake Poznanski
85e0e2a61b Fixing issues with pdf parsing 2024-10-30 16:26:02 +00:00
Jake Poznanski
bcb47946e5 Starting on molmo changes 2024-10-30 08:39:48 -07:00
Jake Poznanski
232c445a23 Pipeline stability fixes hopefully and logging 2024-10-29 20:15:34 +00:00
Jake Poznanski
ce2e4baa87 Applying rotation corrections 2024-10-28 20:32:23 +00:00
Jake Poznanski
08d51b7183 Adding some rotation retry contrl 2024-10-28 20:16:06 +00:00
Jake Poznanski
7678f31aa9 Fixing some reliability issues with the pipeline script 2024-10-28 16:49:00 +00:00
Jake Poznanski
45269fa6a5 Switching to logging vs prints 2024-10-28 15:29:46 +00:00
Jake Poznanski
a3e7654190 Update all docs at once 2024-10-28 15:06:29 +00:00
Jake Poznanski
062abff25c Adding some skip logic 2024-10-27 21:17:48 +00:00
Jake Poznanski
8e6d0c65d6 swtichin to orjson, some better json error handling 2024-10-25 22:10:54 +00:00
Jake Poznanski
48a3affec3 Reindexing 2024-10-25 20:32:51 +00:00
Jake Poznanski
f13d0a5741 List configs to list 2024-10-24 03:07:32 +00:00
Jake Poznanski
ffe470bf0e Fix 2024-10-23 22:55:50 +00:00
Jake Poznanski
180dde03c5 dataprep sampling tests 2024-10-23 22:53:05 +00:00
Jake Poznanski
64041bd6d7 Allow sampling different anchor text lens 2024-10-23 15:37:23 -07:00
Jake Poznanski
6a22900b8a Allow for sampling anchor and other params 2024-10-23 22:26:12 +00:00
Jake Poznanski
999f64dd46 Adding empty anchor support 2024-10-23 22:17:20 +00:00
Jake Poznanski
f8c5aac5a0 Some cleanup 2024-10-23 21:51:54 +00:00
Jake Poznanski
a1a4798ce7 Some crazy idea I had to simplify futures and memory limits 2024-10-23 21:51:37 +00:00
Jake Poznanski
f6ac591fe9 vllm benchmarker 2024-10-23 18:14:50 +00:00
Jake Poznanski
4047258277 Fixing one old bug to make update_static atomic 2024-10-23 17:51:22 +00:00
Jake Poznanski
38dc5a2a0f Refactored to have a more efficient batchwriter, and also not allow too many running futures 2024-10-23 16:28:46 +00:00
Jake Poznanski
d99096e9a2 Adding vllm profile script for reference 2024-10-22 20:00:34 +00:00
Jake Poznanski
0a5c5068b4 index 2024-10-22 16:03:06 +00:00
Jake Poznanski
7c7867626f Fix pipeline bug with indexing 2024-10-22 15:47:11 +00:00
Jake Poznanski
31becaf7e4 S2orc dataset extractor 2024-10-21 21:28:44 +00:00
Jake Poznanski
302eee3da5 Yay matches between birr and hf 2024-10-21 16:58:30 +00:00
Jake Poznanski
f44dbd15ef Small fixes 2024-10-21 16:45:06 +00:00
Jake Poznanski
a4822718ea train more steps 2024-10-19 14:12:44 +00:00
Jake Poznanski
c9ac48bd9d Try to save at the last second only 2024-10-19 02:07:57 +00:00
Jake Poznanski
9d35d3ca8f Birr tokenization test 2024-10-18 23:02:37 +00:00
Jake Poznanski
77f0b9fa84 help text 2024-10-18 22:39:25 +00:00
Jake Poznanski
7dbcbc154b Birr tests that don't do anything but help me understand the universe 2024-10-18 22:39:17 +00:00
Jake Poznanski
492a3f6bef Adding parameters for taget image and anchor text sizes 2024-10-18 21:47:30 +00:00
Jake Poznanski
1c8602c0ff Removing rotation invalid ones to see what happens 2024-10-17 22:41:44 +00:00
Jake Poznanski
dd4f9670b5 Filter refactor 2024-10-17 22:36:38 +00:00
Jake Poznanski
3ecbeae6dc Trying save to s3 but with threaded saver 2024-10-17 21:39:01 +00:00
Jake Poznanski
5ba78edc39 Fix 2024-10-17 20:57:12 +00:00
Jake Poznanski
89fcff233a Fixing saving bug again 2024-10-17 20:37:28 +00:00
Jake Poznanski
7d4cff53b5 Nice test for picking proper page in birrpipelie 2024-10-17 20:26:02 +00:00