Jake Poznanski
|
d45b34fdd5
|
Trust remote code
|
2024-10-30 21:22:39 +00:00 |
|
Jake Poznanski
|
cda0ad7984
|
Config typo
|
2024-10-30 21:18:48 +00:00 |
|
Jake Poznanski
|
cf3b377bb9
|
train script
|
2024-10-30 14:05:02 -07:00 |
|
Jake Poznanski
|
8f001bf74c
|
Config updates
|
2024-10-30 14:02:57 -07:00 |
|
Jake Poznanski
|
6a4a55f9e0
|
Hopefully working molmo HF trainer config
|
2024-10-30 14:00:27 -07:00 |
|
Jake Poznanski
|
bede854cd5
|
Startng to write molmo formatters
|
2024-10-30 13:24:11 -07:00 |
|
Jake Poznanski
|
e65747e591
|
Some better logging
|
2024-10-30 11:22:52 -07:00 |
|
Jake Poznanski
|
a0e0917102
|
Merge branch 'main' of https://github.com/allenai/pdelfin into main
|
2024-10-30 10:42:56 -07:00 |
|
Jake Poznanski
|
43aa4f2508
|
Proper selection of LORA weights
|
2024-10-30 10:42:53 -07:00 |
|
Jake Poznanski
|
c652c7e396
|
Merge branch 'main' of https://github.com/allenai/pdelfin
|
2024-10-30 16:26:03 +00:00 |
|
Jake Poznanski
|
85e0e2a61b
|
Fixing issues with pdf parsing
|
2024-10-30 16:26:02 +00:00 |
|
Jake Poznanski
|
bcb47946e5
|
Starting on molmo changes
|
2024-10-30 08:39:48 -07:00 |
|
Jake Poznanski
|
232c445a23
|
Pipeline stability fixes hopefully and logging
|
2024-10-29 20:15:34 +00:00 |
|
Jake Poznanski
|
ce2e4baa87
|
Applying rotation corrections
|
2024-10-28 20:32:23 +00:00 |
|
Jake Poznanski
|
08d51b7183
|
Adding some rotation retry contrl
|
2024-10-28 20:16:06 +00:00 |
|
Jake Poznanski
|
7678f31aa9
|
Fixing some reliability issues with the pipeline script
|
2024-10-28 16:49:00 +00:00 |
|
Jake Poznanski
|
45269fa6a5
|
Switching to logging vs prints
|
2024-10-28 15:29:46 +00:00 |
|
Jake Poznanski
|
a3e7654190
|
Update all docs at once
|
2024-10-28 15:06:29 +00:00 |
|
Jake Poznanski
|
062abff25c
|
Adding some skip logic
|
2024-10-27 21:17:48 +00:00 |
|
Jake Poznanski
|
8e6d0c65d6
|
swtichin to orjson, some better json error handling
|
2024-10-25 22:10:54 +00:00 |
|
Jake Poznanski
|
48a3affec3
|
Reindexing
|
2024-10-25 20:32:51 +00:00 |
|
Jake Poznanski
|
f13d0a5741
|
List configs to list
|
2024-10-24 03:07:32 +00:00 |
|
Jake Poznanski
|
ffe470bf0e
|
Fix
|
2024-10-23 22:55:50 +00:00 |
|
Jake Poznanski
|
180dde03c5
|
dataprep sampling tests
|
2024-10-23 22:53:05 +00:00 |
|
Jake Poznanski
|
64041bd6d7
|
Allow sampling different anchor text lens
|
2024-10-23 15:37:23 -07:00 |
|
Jake Poznanski
|
6a22900b8a
|
Allow for sampling anchor and other params
|
2024-10-23 22:26:12 +00:00 |
|
Jake Poznanski
|
999f64dd46
|
Adding empty anchor support
|
2024-10-23 22:17:20 +00:00 |
|
Jake Poznanski
|
f8c5aac5a0
|
Some cleanup
|
2024-10-23 21:51:54 +00:00 |
|
Jake Poznanski
|
a1a4798ce7
|
Some crazy idea I had to simplify futures and memory limits
|
2024-10-23 21:51:37 +00:00 |
|
Jake Poznanski
|
f6ac591fe9
|
vllm benchmarker
|
2024-10-23 18:14:50 +00:00 |
|
Jake Poznanski
|
4047258277
|
Fixing one old bug to make update_static atomic
|
2024-10-23 17:51:22 +00:00 |
|
Jake Poznanski
|
38dc5a2a0f
|
Refactored to have a more efficient batchwriter, and also not allow too many running futures
|
2024-10-23 16:28:46 +00:00 |
|
Jake Poznanski
|
d99096e9a2
|
Adding vllm profile script for reference
|
2024-10-22 20:00:34 +00:00 |
|
Jake Poznanski
|
0a5c5068b4
|
index
|
2024-10-22 16:03:06 +00:00 |
|
Jake Poznanski
|
7c7867626f
|
Fix pipeline bug with indexing
|
2024-10-22 15:47:11 +00:00 |
|
Jake Poznanski
|
31becaf7e4
|
S2orc dataset extractor
|
2024-10-21 21:28:44 +00:00 |
|
Jake Poznanski
|
302eee3da5
|
Yay matches between birr and hf
|
2024-10-21 16:58:30 +00:00 |
|
Jake Poznanski
|
f44dbd15ef
|
Small fixes
|
2024-10-21 16:45:06 +00:00 |
|
Jake Poznanski
|
a4822718ea
|
train more steps
|
2024-10-19 14:12:44 +00:00 |
|
Jake Poznanski
|
c9ac48bd9d
|
Try to save at the last second only
|
2024-10-19 02:07:57 +00:00 |
|
Jake Poznanski
|
9d35d3ca8f
|
Birr tokenization test
|
2024-10-18 23:02:37 +00:00 |
|
Jake Poznanski
|
77f0b9fa84
|
help text
|
2024-10-18 22:39:25 +00:00 |
|
Jake Poznanski
|
7dbcbc154b
|
Birr tests that don't do anything but help me understand the universe
|
2024-10-18 22:39:17 +00:00 |
|
Jake Poznanski
|
492a3f6bef
|
Adding parameters for taget image and anchor text sizes
|
2024-10-18 21:47:30 +00:00 |
|
Jake Poznanski
|
1c8602c0ff
|
Removing rotation invalid ones to see what happens
|
2024-10-17 22:41:44 +00:00 |
|
Jake Poznanski
|
dd4f9670b5
|
Filter refactor
|
2024-10-17 22:36:38 +00:00 |
|
Jake Poznanski
|
3ecbeae6dc
|
Trying save to s3 but with threaded saver
|
2024-10-17 21:39:01 +00:00 |
|
Jake Poznanski
|
5ba78edc39
|
Fix
|
2024-10-17 20:57:12 +00:00 |
|
Jake Poznanski
|
89fcff233a
|
Fixing saving bug again
|
2024-10-17 20:37:28 +00:00 |
|
Jake Poznanski
|
7d4cff53b5
|
Nice test for picking proper page in birrpipelie
|
2024-10-17 20:26:02 +00:00 |
|