Update README.md

This commit is contained in:
Jake Poznanski 2025-05-09 14:48:49 -07:00 committed by GitHub
parent 1854ae1269
commit 225b705eef
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -100,16 +100,17 @@ Several categories of tests have been made so far:
## TODO List for release ## TODO List for release
- [ ] Check all tests for duplicates - [X] Check all tests for duplicates
- [ ] Make absense tests not case sensitive by default - [X] Make absense tests not case sensitive by default
- [ ] Check that we have URLs for all tests - [ ] Check that we have URLs for all tests
- [ ] Write a script to verify that all baseline tests that actually have weird unicodes have exemptions - [X] Write a script to verify that all baseline tests that actually have weird unicodes have exemptions
- [X] Review math equations in old_scans_math.jsonl using chat gpt script - [X] Review math equations in old_scans_math.jsonl using chat gpt script
- [X] Add test category of long_texts which are still ~1 standard printed page, but with dense/small text - [X] Add test category of long_texts which are still ~1 standard printed page, but with dense/small text
- [ ] Review multicolumn_tests, make sure they are correct, clean, and don't have order tests between regions - [X] Review multicolumn_tests, make sure they are correct, clean, and don't have order tests between regions
- [ ] Run automated check of multicolumn tests for: #1 sub/super scripts #2 max diffs calibrations #3 mixing across different distinct regions of text - [X] Run automated check of multicolumn tests for: #1 sub/super scripts #2 max diffs calibrations #3 mixing across different distinct regions of text
- [X] Remove [] and other special symbols from old_scans - [X] Remove [] and other special symbols from old_scans
- [X] Full review of old_scans, somehow, chatgpt or prolific - [X] Full review of old_scans, somehow, chatgpt or prolific
- [X] Adjust scoring to weight each test category equally in final score distribution - [X] Adjust scoring to weight each test category equally in final score distribution
- [X] Double check marker inline math outputs - [X] Double check marker inline math outputs
- [ ] Remove any PII documents
- [ ] Run against final set of comparison tools, and check list of all-pass and all-fail tests - [ ] Run against final set of comparison tools, and check list of all-pass and all-fail tests