mirror of
https://github.com/allenai/olmocr.git
synced 2025-10-13 09:12:18 +00:00
Update README.md
This commit is contained in:
parent
b3c3a13e03
commit
f7529f4e60
@ -44,6 +44,9 @@ git clone https://github.com/allenai/olmocr.git
|
|||||||
cd olmocr
|
cd olmocr
|
||||||
|
|
||||||
pip install -e .[bench]
|
pip install -e .[bench]
|
||||||
|
|
||||||
|
# Now clone the benchmark data
|
||||||
|
git clone https://huggingface.co/datasets/allenai/olmOCR-bench
|
||||||
```
|
```
|
||||||
|
|
||||||
Convert your documents
|
Convert your documents
|
||||||
@ -60,6 +63,17 @@ Now run the benchmark
|
|||||||
python -m olmocr.bench.benchmark --dir ./olmOCR-bench/bench_data
|
python -m olmocr.bench.benchmark --dir ./olmOCR-bench/bench_data
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Previewing the benchmark questions
|
||||||
|
|
||||||
|
We have an internal data annotation tool that can be used to review the questions in the benchmark, and make edits.
|
||||||
|
|
||||||
|
<img width="700" alt="image" src="https://github.com/user-attachments/assets/dd24fd88-a642-4379-b5a1-9911717bf5b1" />
|
||||||
|
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m olmocr.bench.review_app --port 5000 --debug ./olmOCR-bench/bench_data/multi_column.jsonl --force
|
||||||
|
```
|
||||||
|
|
||||||
## How were the tests made
|
## How were the tests made
|
||||||
|
|
||||||
Several categories of tests have been made so far:
|
Several categories of tests have been made so far:
|
||||||
|
Loading…
x
Reference in New Issue
Block a user