Logo
Explore Help
Register Sign In
yujunjun/olmocr
1
0
Fork 0
You've already forked olmocr
mirror of https://github.com/allenai/olmocr.git synced 2025-10-31 18:15:44 +00:00
Code Issues Packages Projects Releases Wiki Activity
olmocr/olmocr
History
Jake Poznanski a651cf0ca6 Adding guided regex decoder
2025-07-01 17:44:02 +00:00
..
bench
addressed Jake's comment for pagenumbers with \d+
2025-06-23 23:29:10 +00:00
data
Rendering the pdfs in the dataloader
2025-06-11 18:11:42 +00:00
filter
fixed lint check
2025-02-07 16:29:27 -08:00
prompts
Checking that anchor text works for each pdf page when initializing dataloader
2025-06-30 16:29:33 +00:00
train
Better prepare checkpoint script
2025-07-01 16:44:19 +00:00
viewer
Black formatting
2025-01-29 15:30:39 -08:00
__init__.py
Massive refactor from pdelfin to olmocr
2025-01-27 18:30:41 +00:00
check.py
Probably need at least 20GB GPU ram to have a good time with olmocr
2025-03-03 15:54:47 -08:00
datatypes.py
Massive refactor from pdelfin to olmocr
2025-01-27 18:30:41 +00:00
image_utils.py
Unused import
2025-03-31 13:30:20 -07:00
metrics.py
Lints
2025-06-17 15:58:16 +00:00
pipeline.py
Adding guided regex decoder
2025-07-01 17:44:02 +00:00
py.typed
Massive refactor from pdelfin to olmocr
2025-01-27 18:30:41 +00:00
repeatdetect.py
Lints
2025-03-13 22:26:53 +00:00
s3_utils.py
resolved all the mypy, black and isort issues and updated readme
2025-02-07 16:05:00 -08:00
version.py
Version bump
2025-06-23 21:54:06 +00:00
work_queue.py
Upping version to fix issue with work queue and delimited paths
2025-04-15 18:50:13 +00:00
Powered by Gitea Version: 1.23.5 Page: 1079ms Template: 36ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API