Jake Poznanski
|
b2894d0280
|
Massive refactor from pdelfin to olmocr
|
2025-01-27 18:30:41 +00:00 |
|
Jake Poznanski
|
3c1b7de293
|
Refactoring of train dataloaders
|
2024-10-16 18:26:25 +00:00 |
|
Jake Poznanski
|
23d129fd2c
|
Organizing around a new style of dataloader
|
2024-10-16 18:06:27 +00:00 |
|
Jake Poznanski
|
96682b2ecb
|
Refactoring
|
2024-10-16 16:18:27 +00:00 |
|
Jake Poznanski
|
4bf6e7a430
|
Refactoring
|
2024-10-09 18:11:18 +00:00 |
|
Jake Poznanski
|
230c8a9f9a
|
Trying new run that will rewrite the prompts as it goes
|
2024-10-08 22:10:18 +00:00 |
|
Jake Poznanski
|
ebd40f9084
|
Hopefully fixing dataloader for now
|
2024-10-07 12:59:27 -07:00 |
|
Jake Poznanski
|
d8e459c9f3
|
Weird issue with surrogate pairs in json
|
2024-10-07 09:04:13 -07:00 |
|
Jake Poznanski
|
98020cabbb
|
Allow loading files locally
|
2024-10-07 07:49:16 -07:00 |
|
Jake Poznanski
|
1686790ac8
|
Checking filtering logic
|
2024-10-02 22:45:40 +00:00 |
|
Jake Poznanski
|
decfd7fbc1
|
Fixing the refiner input prompt to something simpler that doesn't depend on the training data. Fixing beaker job workspace and bumping priority to high.
|
2024-09-27 22:54:07 +00:00 |
|
Jake Poznanski
|
22b765e6be
|
Going back to non iterable dataset, so shuffling works better, applying a light filter
|
2024-09-27 15:48:56 +00:00 |
|
Jake Poznanski
|
c00e40d1c4
|
More fixes
|
2024-09-26 23:10:07 +00:00 |
|
Jake Poznanski
|
d098a87ed2
|
Column name fix
|
2024-09-26 22:29:19 +00:00 |
|
Jake Poznanski
|
61dd7bb61f
|
Fix for map in iterable mode
|
2024-09-26 20:44:47 +00:00 |
|
Jake Poznanski
|
cf1aa0176e
|
Proper use of iterable_dataset
|
2024-09-26 19:55:54 +00:00 |
|
Jake Poznanski
|
9cbc128553
|
Sampling some sequence lengths
|
2024-09-25 09:05:11 -07:00 |
|
Jake Poznanski
|
bab32aa9b3
|
Formatting
|
2024-09-18 22:52:42 +00:00 |
|
Jake Poznanski
|
f4d18cb287
|
Dataloader capabable of loading 38k rows reasonably fast
|
2024-09-18 22:48:38 +00:00 |
|
Jake Poznanski
|
d22b311340
|
Starting to write dataloader for visual lm data
|
2024-09-18 21:42:09 +00:00 |
|