mirror of
https://github.com/allenai/olmocr.git
synced 2025-10-13 09:12:18 +00:00
Update README.md
This commit is contained in:
parent
b3c3a13e03
commit
f7529f4e60
@ -44,6 +44,9 @@ git clone https://github.com/allenai/olmocr.git
|
||||
cd olmocr
|
||||
|
||||
pip install -e .[bench]
|
||||
|
||||
# Now clone the benchmark data
|
||||
git clone https://huggingface.co/datasets/allenai/olmOCR-bench
|
||||
```
|
||||
|
||||
Convert your documents
|
||||
@ -60,6 +63,17 @@ Now run the benchmark
|
||||
python -m olmocr.bench.benchmark --dir ./olmOCR-bench/bench_data
|
||||
```
|
||||
|
||||
## Previewing the benchmark questions
|
||||
|
||||
We have an internal data annotation tool that can be used to review the questions in the benchmark, and make edits.
|
||||
|
||||
<img width="700" alt="image" src="https://github.com/user-attachments/assets/dd24fd88-a642-4379-b5a1-9911717bf5b1" />
|
||||
|
||||
|
||||
```bash
|
||||
python -m olmocr.bench.review_app --port 5000 --debug ./olmOCR-bench/bench_data/multi_column.jsonl --force
|
||||
```
|
||||
|
||||
## How were the tests made
|
||||
|
||||
Several categories of tests have been made so far:
|
||||
|
Loading…
x
Reference in New Issue
Block a user