Logo
Explore Help
Register Sign In
yujunjun/olmocr
1
0
Fork 0
You've already forked olmocr
mirror of https://github.com/allenai/olmocr.git synced 2025-07-23 09:02:16 +00:00
Code Issues Packages Projects Releases Wiki Activity
157 Commits 48 Branches 16 Tags
Commit Graph

11 Commits

Author SHA1 Message Date
Jake Poznanski
dc26541da2 Starting code to build parquets... 2024-10-07 20:59:43 +00:00
Jake Poznanski
d8e459c9f3 Weird issue with surrogate pairs in json 2024-10-07 09:04:13 -07:00
Jake Poznanski
98020cabbb Allow loading files locally 2024-10-07 07:49:16 -07:00
Jake Poznanski
decfd7fbc1 Fixing the refiner input prompt to something simpler that doesn't depend on the training data. Fixing beaker job workspace and bumping priority to high. 2024-09-27 22:54:07 +00:00
Jake Poznanski
22b765e6be Going back to non iterable dataset, so shuffling works better, applying a light filter 2024-09-27 15:48:56 +00:00
Jake Poznanski
86813fe210 Filtering off the weird tail ends of the distribution to make training smoother 2024-09-25 09:49:03 -07:00
Jake Poznanski
5916239cd8 typos 2024-09-23 09:43:36 -07:00
Jake Poznanski
ea3af0143c Loading dataset from config now 2024-09-23 09:40:24 -07:00
Jake Poznanski
bab32aa9b3 Formatting 2024-09-18 22:52:42 +00:00
Jake Poznanski
f4d18cb287 Dataloader capabable of loading 38k rows reasonably fast 2024-09-18 22:48:38 +00:00
Jake Poznanski
d22b311340 Starting to write dataloader for visual lm data 2024-09-18 21:42:09 +00:00
Powered by Gitea Version: 1.23.5 Page: 623ms Template: 55ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API