diff --git a/olmocr/bench/README.md b/olmocr/bench/README.md index 5b47d44..f0d65b8 100644 --- a/olmocr/bench/README.md +++ b/olmocr/bench/README.md @@ -44,6 +44,9 @@ git clone https://github.com/allenai/olmocr.git cd olmocr pip install -e .[bench] + +# Now clone the benchmark data +git clone https://huggingface.co/datasets/allenai/olmOCR-bench ``` Convert your documents @@ -60,6 +63,17 @@ Now run the benchmark python -m olmocr.bench.benchmark --dir ./olmOCR-bench/bench_data ``` +## Previewing the benchmark questions + +We have an internal data annotation tool that can be used to review the questions in the benchmark, and make edits. + +image + + +```bash +python -m olmocr.bench.review_app --port 5000 --debug ./olmOCR-bench/bench_data/multi_column.jsonl --force +``` + ## How were the tests made Several categories of tests have been made so far: