diff --git a/olmocr/bench/README.md b/olmocr/bench/README.md
index c9fb454..a7b6961 100644
--- a/olmocr/bench/README.md
+++ b/olmocr/bench/README.md
@@ -100,16 +100,17 @@ Several categories of tests have been made so far:
 
 
 ## TODO List for release
- - [ ] Check all tests for duplicates
- - [ ] Make absense tests not case sensitive by default
+ - [X] Check all tests for duplicates
+ - [X] Make absense tests not case sensitive by default
  - [ ] Check that we have URLs for all tests
- - [ ] Write a script to verify that all baseline tests that actually have weird unicodes have exemptions
+ - [X] Write a script to verify that all baseline tests that actually have weird unicodes have exemptions
  - [X] Review math equations in old_scans_math.jsonl using chat gpt script
  - [X] Add test category of long_texts which are still ~1 standard printed page, but with dense/small text
- - [ ] Review multicolumn_tests, make sure they are correct, clean, and don't have order tests between regions
- - [ ] Run automated check of multicolumn tests for: #1 sub/super scripts #2 max diffs calibrations #3 mixing across different distinct regions of text 
+ - [X] Review multicolumn_tests, make sure they are correct, clean, and don't have order tests between regions
+ - [X] Run automated check of multicolumn tests for: #1 sub/super scripts #2 max diffs calibrations #3 mixing across different distinct regions of text 
  - [X] Remove [] and other special symbols from old_scans
  - [X] Full review of old_scans, somehow, chatgpt or prolific
  - [X] Adjust scoring to weight each test category equally in final score distribution
  - [X] Double check marker inline math outputs
+ - [ ] Remove any PII documents
  - [ ] Run against final set of comparison tools, and check list of all-pass and all-fail tests