1501 Commits

Author SHA1 Message Date
Jake Poznanski
3c22cf3430 Lints 2025-03-18 19:01:03 +00:00
Jake Poznanski
da05b4ca4f Merge branch 'main' of https://github.com/allenai/olmocr 2025-03-18 18:57:51 +00:00
Jake Poznanski
d620722a0e Review app is much nicer now 2025-03-18 18:57:50 +00:00
Jake Poznanski
5ec96476c9 Keyboard shorcuts 2025-03-18 18:46:41 +00:00
Jake Poznanski
9df5102d34 review document 2025-03-18 18:36:18 +00:00
Jake Poznanski
7f921f436a review app 2025-03-18 18:17:59 +00:00
Jake Poznanski
89b628d0bb Slighty better 2025-03-18 17:57:45 +00:00
Jake Poznanski
9344107994 pdf viewr 2025-03-18 17:43:07 +00:00
Jake Poznanski
4939e41154 Flask based review app first attempt 2025-03-18 16:53:36 +00:00
Jake Poznanski
f514f39819 Adding raw transformers implementation 2025-03-18 09:07:48 -07:00
Jake Poznanski
93450c326d Table miner 2025-03-18 15:33:01 +00:00
aman-17
d34a3576a2 removed mine_diffs_candidates.jsonl 2025-03-17 16:36:05 -07:00
aman-17
e1a2074703 removed pp_doc_layout script 2025-03-17 14:24:55 -07:00
aman-17
1297c82447 added few examples for headers and footers 2025-03-17 14:22:34 -07:00
aman-17
f6ea131596 Merge remote-tracking branch 'origin/main' into amanr/pp-doc-layout
Merged amanr/pp-doc with main
2025-03-17 13:55:55 -07:00
Jake Poznanski
b472845f33 Table miners 2025-03-17 20:50:21 +00:00
aman-17
b5bd179128 Merge remote-tracking branch 'origin/main' into amanr/pp-doc-layout
merge from main
2025-03-17 12:35:46 -07:00
aman-17
8f356a18d4 added pp_doc 2025-03-17 12:34:45 -07:00
Jake Poznanski
aee030c42b Fixing sample dataset, outputting some reports for debugging. Math is good enough for now 2025-03-17 10:59:02 -07:00
Jake Poznanski
dd725636a3 Bump version to v0.1.60 for release v0.1.60 2025-03-17 08:59:18 -07:00
Jake Poznanski
baa00825b0 Don't go down too low in temp 2025-03-17 08:48:19 -07:00
Jake Poznanski
f2951f3f78 Lints 2025-03-17 08:47:57 -07:00
Jake Poznanski
1e42e5ea9a Faster and nicer equation cache 2025-03-17 08:47:06 -07:00
Jake Poznanski
1f8cc59b22 Pipeline scales temperature automatically, increases performance ~2% 2025-03-14 22:27:51 -07:00
Jake Poznanski
4768ac4be5 Merge branch 'main' of https://github.com/allenai/olmocr 2025-03-14 22:32:39 +00:00
Jake Poznanski
0968bd17ce Mine headers footers 2025-03-14 22:32:38 +00:00
Jake Poznanski
7b4026233c Benchmark script supports rel paths 2025-03-14 13:22:12 -07:00
Jake Poznanski
1270ca336a lints 2025-03-14 17:53:43 +00:00
Jake Poznanski
d7361c436e Basic convert script 2025-03-14 10:35:46 -07:00
Jake Poznanski
142a9cbd20 Convert script to support broader folder structures 2025-03-14 10:12:21 -07:00
Jake Poznanski
98c4283eef Cap max workers to hopefully improve stability 2025-03-14 10:08:30 -07:00
Jake Poznanski
5f3ef510ab Faster equation cache and checking, cleanup data script 2025-03-14 16:40:16 +00:00
Jake Poznanski
79e2677319 Hmm, these should be passing! 2025-03-14 02:52:13 +00:00
Jake Poznanski
f5d92bdb14 Trying to get new CI to work 2025-03-14 02:43:55 +00:00
Jake Poznanski
1db1b3406b
Merge pull request #122 from allenai/gpu-ci
Gpu ci plumbing
2025-03-13 16:05:38 -07:00
Chris Wilhelm
c585415797 for now, only process one pdf in the ci script 2025-03-13 15:48:47 -07:00
Chris Wilhelm
f787625b45 make sure gpu checks only run if normal checks succeed 2025-03-13 15:32:18 -07:00
Chris Wilhelm
9b958e65f1 moves what happens where around a bit and updates readme 2025-03-13 15:31:55 -07:00
Chris Wilhelm
098b01c006 wire it up into a gh action 2025-03-13 15:31:55 -07:00
Chris Wilhelm
7e8492059c wip 2025-03-13 15:31:55 -07:00
Chris Wilhelm
671e4a7dae move the gpu reqs to setup.py altogether 2025-03-13 15:31:55 -07:00
Chris Wilhelm
6ac29c781e actually try setup.py 2025-03-13 15:31:55 -07:00
Chris Wilhelm
927d7d9117 setup.cfg instead of in pyproject for dep links 2025-03-13 15:31:55 -07:00
Chris Wilhelm
f1524957b1 specify gpu deps in pyproject 2025-03-13 15:31:55 -07:00
Chris Wilhelm
29b9054749 basic docker image and test 2025-03-13 15:31:55 -07:00
Jake Poznanski
9f38a8a602 Lints 2025-03-13 22:29:27 +00:00
Jake Poznanski
5009bb31f1 Lints 2025-03-13 22:26:53 +00:00
Jake Poznanski
acb0df32a8 Fixes 2025-03-13 13:15:45 -07:00
Jake Poznanski
3eec2a855b Mining math 2025-03-13 13:11:01 -07:00
Jake Poznanski
95f03e1e42 More small tests 2025-03-13 12:50:52 -07:00