Logo
Explore Help
Register Sign In
yujunjun/olmocr
1
0
Fork 0
You've already forked olmocr
mirror of https://github.com/allenai/olmocr.git synced 2025-08-22 15:52:43 +00:00
Code Issues Packages Projects Releases Wiki Activity
307 Commits 58 Branches 24 Tags
Commit Graph

60 Commits

Author SHA1 Message Date
Jake Poznanski
a47afe5c8d Adding test to make sure the traning and inference time tokenization stays identical, currenlty failing 2024-09-20 12:01:05 -07:00
Jake Poznanski
bab32aa9b3 Formatting 2024-09-18 22:52:42 +00:00
Jake Poznanski
f4d18cb287 Dataloader capabable of loading 38k rows reasonably fast 2024-09-18 22:48:38 +00:00
Jake Poznanski
d22b311340 Starting to write dataloader for visual lm data 2024-09-18 21:42:09 +00:00
Jake Poznanski
af2126df99 450tok/sec/core with smollm that appears to work well 2024-09-17 19:59:02 +00:00
Jake Poznanski
2f71cb9232 Using SmolLM, seems a lot better and is able to pass some tests 2024-09-17 18:47:27 +00:00
Jake Poznanski
57e80aacd2 Testing coherence with distilgpt2, but it doesn't work great 2024-09-17 16:58:45 +00:00
Jake Poznanski
01bc0b2f10 Moving a whole bunch of code over, still broken 2024-09-17 16:26:55 +00:00
Jake Poznanski
a534a0180d Moving pdf filter code over with tests 2024-09-17 15:16:58 +00:00
Jake Poznanski
68b2c0e8d6
Initial commit 2024-09-17 07:53:43 -07:00
First Previous 1 2 Next Last
Powered by Gitea Version: 1.23.5 Page: 91ms Template: 19ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API