Update README.md

This commit is contained in:
Jake Poznanski 2025-04-11 11:23:17 -07:00 committed by GitHub
parent b3c3a13e03
commit f7529f4e60
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -44,6 +44,9 @@ git clone https://github.com/allenai/olmocr.git
cd olmocr cd olmocr
pip install -e .[bench] pip install -e .[bench]
# Now clone the benchmark data
git clone https://huggingface.co/datasets/allenai/olmOCR-bench
``` ```
Convert your documents Convert your documents
@ -60,6 +63,17 @@ Now run the benchmark
python -m olmocr.bench.benchmark --dir ./olmOCR-bench/bench_data python -m olmocr.bench.benchmark --dir ./olmOCR-bench/bench_data
``` ```
## Previewing the benchmark questions
We have an internal data annotation tool that can be used to review the questions in the benchmark, and make edits.
<img width="700" alt="image" src="https://github.com/user-attachments/assets/dd24fd88-a642-4379-b5a1-9911717bf5b1" />
```bash
python -m olmocr.bench.review_app --port 5000 --debug ./olmOCR-bench/bench_data/multi_column.jsonl --force
```
## How were the tests made ## How were the tests made
Several categories of tests have been made so far: Several categories of tests have been made so far: