From 4b4ba454ba78d9777d2d75c3a3dd487eb2321d50 Mon Sep 17 00:00:00 2001
From: Jake Poznanski <jakep@allenai.org>
Date: Thu, 15 May 2025 16:17:29 -0700
Subject: [PATCH] Update README.md

---
 README.md | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 6ee0095..fc79abe 100644
--- a/README.md
+++ b/README.md
@@ -27,7 +27,7 @@ A toolkit for training language models to work with PDF documents in the wild.
 
 Try the online demo: [https://olmocr.allenai.org/](https://olmocr.allenai.org/)
 
-What is included:
+What is included here:
  - A prompting strategy to get really good natural text parsing using ChatGPT 4o - [buildsilver.py](https://github.com/allenai/olmocr/blob/main/olmocr/data/buildsilver.py)
  - An side-by-side eval toolkit for comparing different pipeline versions - [runeval.py](https://github.com/allenai/olmocr/blob/main/olmocr/eval/runeval.py)
  - Basic filtering by language and SEO spam removal - [filter.py](https://github.com/allenai/olmocr/blob/main/olmocr/filter/filter.py)
@@ -35,6 +35,11 @@ What is included:
  - Processing millions of PDFs through a finetuned model using Sglang - [pipeline.py](https://github.com/allenai/olmocr/blob/main/olmocr/pipeline.py)
  - Viewing [Dolma docs](https://github.com/allenai/dolma) created from PDFs - [dolmaviewer.py](https://github.com/allenai/olmocr/blob/main/olmocr/viewer/dolmaviewer.py)
 
+See also:
+
+[**olmOCR-Bench**](https://github.com/allenai/olmocr/tree/main/olmocr/bench):
+A comprehensive benchmark suite covering over 1,400 documents to help measure performance of OCR systems
+
 ### Installation
 
 Requirements: