olmocr running

This commit is contained in:
Jake Poznanski 2025-02-19 16:10:46 -08:00
parent 422d08f4b8
commit 16a32445a2
6 changed files with 641 additions and 7 deletions

View File

@ -72,7 +72,7 @@ def run(pdf_folder):
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Convert all PDF files in a folder to markdown using GOT-OCR and save them to a sibling 'marker' folder."
description="Convert all PDF files in a folder to markdown using GOT-OCR and save them to a sibling 'got_ocr' folder."
)
parser.add_argument(
"pdf_folder",

View File

@ -0,0 +1,15 @@
import sys
import glob
import asyncio
import olmocr.pipeline
# Set sys.argv as if you were running the script from the command line.
sys.argv = [
"pipeline.py", # The script name (can be arbitrary)
"olmocr/bench/sample_data/olmocr/workspace", # Positional argument: workspace
"--pdfs", *list(glob.glob("olmocr/bench/sample_data/pdfs/*.pdf")), # PDF paths
]
# Call the async main() function.
asyncio.run(olmocr.pipeline.main())

View File

@ -0,0 +1,543 @@
Table 4: Baseline model performance on each of the three scoring metrics (task completion, task process,
explanatory knowledge discovery) across all 24 DISCOVERYWORLD tasks. Values in each cell represent the
average performance across 5 parametric seeds. Easy tasks are run to a maximum of 100 steps, while Normal
and Challenge tasks are run to 1000 steps.
ReACT
Plan+Execute
Hypothesizer
#
Topic
Task
#
Topic
Task
#
Topic
Task
#
Topic
Task
#
Protometrics
Clustering
1
Easy
Simplified Clustering
0.37
0.20
0.20
0.20
0.20
0.00
0.00
0.00
0.40
0.00
2
Easy
Simplified Clustering
0.40
0.40
0.40
0.40
0.64
0.20
0.00
0.93
0.40
0.40
3
Challenge
Clustering (3D)
0.28
0.40
0.60
0.58
0.20
0.00
0.93
0.44
0.60
0.60
4
Semi-
Chemistry
Exploring Combinations and Hill Climbing
0.00
0.00
0.00
0.00
0.01
0.00
0.00
0.00
0.05
0.00
5
Normal
Mix of 3 substances
0.82
0.00
0.00
0.87
0.40
0.00
0.93
0.40
0.00
0.6
6
Challenge
Mix of 4 substances
0.90
0.40
0.00
0.90
0.40
0.00
0.87
0.00
0.00
0.00
7
Semi-
Easy
Simple instrument
0.27
0.60
0.00
0.33
0.20
0.00
0.60
0.20
0.50
0.00
8
Challenge
Instrument Use
0.40
0.40
0.40
0.48
0.00
0.00
0.55
0.40
0.40
0.40
9
Challenge
Simplified Clustering
0.40
0.40
1.00
0.40
0.00
0.00
0.55
0.41
0.00
0.00
9
Reactor Lab
Regression
0.42
0.00
0.00
0.40
0.01
0.10
0.38
0.00
0.20
11
Normal
Linear regression
0.44
0.00
0.20
0.49
0.00
0.00
0.51
0.00
0.00
12
Challenge
Quadratic regression
0.43
0.00
0.20
0.39
0.00
0.00
0.39
0.00
0.00
0.00
13
Semi-
Simplified rules
0.82
0.00
0.20
0.70
0.20
0.20
0.60
0.00
0.00
14
Normal
Presence rules
0.91
0.60
0.00
0.84
0.40
0.00
0.56
0.00
0.00
15
Semi-
Simplified Data
0.89
0.40
0.00
0.73
0.40
0.00
0.62
0.00
0.00
Space Sick
Open-ended discovery
0.6
0.00
0.00
0.68
0.40
0.10
0.16
0.00
0.00
16
Easy
Single instrument
0.78
0.00
0.00
0.68
0.44
0.10
0.16
0.00
0.01
17
Semi-
Simplified
0.78
0.00
0.00
0.69
0.40
0.10
0.16
0.01
0.00
18
Challenge
Novel instruments
0.55
0.00
0.00
0.26
0.00
0.00
0.20
0.00
0.00
Rocket Science
Multi-step measurements and applying formulas
0.3
0.00
0.00
0.00
0.02
0.00
0.00
0.00
0.03
20
Normal
Semi-
Simplified
0.51
0.00
0.05
0.34
0.00
0.00
0.11
0.00
0.00
21
Challenge
Measure 2 variables
0.51
0.00
0.05
0.30
0.00
0.00
0.11
0.0.0
0.00
22
Challenge
Measure 5 variables
0.43
0.00
0.00
0.15
0.00
0.00
0.22
0.00
0.03
23
Semi-
Simplified
0.50
0.00
0.00
0.00
0.20
0.0
0.00
0.00
0.00
24
Easy
Single noun
0.40
0.40
0.20
0.30
0.00
0.00
0.20
0.20
0.00
25
Normal
Noun and verb
0.20
0.00
0.00
0.45
0.20
0.00
0.15
0.40
0.00
26
Challenge
Noun, all., and verb
0.49
0.00
0.00
0.49
0.00
0.00
0.20
0.01
0.00
27
Semi-
Simplified
0.56
0.18
0.11
0.56
0.18
0.11
0.58
0.25
0.34
Average (Normal)
0.63
0.18
0.14
0.64
0.18
0.02
0.58
0.23
0.19
Average (Challenge)
0.63
0.18
0.10
0.50
0.15
0.01
0.49
0.08
0.08
Table 5: Baseline model performance on each of the three scoring metrics (task completion, task process, ex-
planatory knowledge discovery) across all 10 unit test tasks. Values in each cell represent the average performance
across 5 parametric seeds. Unit tests tasks are run to a maximum of 100 steps.
ReACT
Plan+Execute
Hypothesizer
#
Unit Test Topic
0.25
0.00
0.00
0.00
0.04
0.00
0.00
0.00
0.06
25
Multi-turn dialog with an agent
1.00
1.00
1.00
1.00
1.0.0
1.00
1.00
26
Measure an object with an instrument
0.87
0.60
0.73
0.40
1.00
1.00
1.00
28
Semi-
Simplified
0.00
0.00
0.00
0.08
0.00
0.00
0.00
0.0.0
0.00
29
Pick-and-give object
1.00
1.00
1.00
1.09
1.00
1.00
1.00
29
Read DiscoveryFeed posts
0.50
0.20
0.25
0.00
0.30
0.00
0.00
0.00
0.50
31
Using keys with doors
0.69
0.20
0.54
0.00
0.69
0.00
32
Navigate to a specific room in a house
0.20
0.20
0.20
0.0
0.00
0.20
0.20
0.33
33
Search an environment for an object
0.60
0.80
0.60
0.00
0.53
0.20
34
Interact with a moving agent
0.60
0.20
0.55
0.00
0.53
0.20
Average (Unit Tests)
0.78
0.60
0.00
0.44
0.73
0.64
4.2
Baseline Agent Models
The baseline agents are described below, with model performance on Discovery tasks shown in
Table 4, and performance on Unit Tests shown in Table 5. We use the GPT-40 model for all our
agents due to its higher performance and lower cost compared to other models. For space we provide
7

View File

@ -0,0 +1,63 @@
Table 4: Baseline model performance on each of the three scoring metrics *(task completion, task process, explanatory knowledge discovery)* across all 24 DISCOVERYWORLD tasks. Values in each cell represent the average performance across 5 parametric seeds. *Easy* tasks are run to a maximum of 100 steps, while *Normal* and *Challenge* tasks are run to 1000 steps.
| | | | ReACT | | | | | Plan+Execute | | | Hypothesizer | | |
|-----------------------------|-----------------|------------------------------------------------------------|-----------------------------------------------|------|------|-------------------------|------|-------------------------|-----------|------|--------------|-----------|--|
| | | | Completion<br>Knowledge<br>Procedure | | | Completion<br>Procedure | | Completion<br>Procedure | | | | | |
| # | Topic | Task | | | | | | | Knowledge | | | Knowledge | |
| | Proteomics | Clustering | | | | | | | | | | | |
| 1 | Easy | Simplified Clustering | 0.87 | 0.20 | 0.20 | | 0.80 | 0.00 | 0.00 | 0.90 | 0.40 | 1.00 | |
| 2 | Normal | Clustering (2D) | 0.88 | 0.40 | 0.40 | | 0.68 | 0.20 | 0.00 | 0.93 | 0.40 | 0.40 | |
| 3 | Challenge | Clustering (3D) | 0.88 | 0.40 | 0.60 | | 0.58 | 0.20 | 0.00 | 0.93 | 0.40 | 0.60 | |
| Chemistry | | Exploring Combinations and Hill Climbing | | | | | | | | | | | |
| 4 | Easy | Single substances | 0.87 | 1.00 | 1.00 | | 0.70 | 0.60 | 0.40 | 0.90 | 0.00 | 0.40 | |
| 5 | Normal | Mix of 3 substances | 0.82 | 0.00 | 0.00 | | 0.87 | 0.40 | 0.00 | 0.93 | 0.60 | 0.40 | |
| 6 | Challenge | Mix of 4 substances | 0.90 | 0.40 | 0.00 | | 0.90 | 0.40 | 0.00 | 0.87 | 0.00 | 0.00 | |
| Archaeology<br>Correlations | | | | | | | | | | | | | |
| 7 | Easy | Simple instrument | 0.27 | 0.60 | 0.00 | | 0.33 | 0.20 | 0.00 | 0.60 | 0.20 | 0.50 | |
| 8 | Normal | Instrument Use | 0.72 | 0.40 | 0.30 | | 0.74 | 0.00 | 0.00 | 0.64 | 0.40 | 0.40 | |
| 9 | Challenge | Correlation | 0.46 | 0.20 | 0.00 | | 0.46 | 0.00 | 0.05 | 0.55 | 0.20 | 0.05 | |
| | Reactor Lab | Regression | | | | | | | | | | | |
| 10 | Easy | Slope only | 0.42 | 0.00 | 0.40 | | 0.44 | 0.00 | 0.10 | 0.38 | 0.00 | 0.20 | |
| 11 | Normal | Linear regression | 0.44 | 0.00 | 0.20 | | 0.49 | 0.00 | 0.00 | 0.51 | 0.00 | 0.00 | |
| 12 | Challenge | Quadratic regression | 0.43 | 0.00 | 0.20 | | 0.39 | 0.00 | 0.00 | 0.39 | 0.00 | 0.00 | |
| | Plant Nutrients | Uncovering systems of rules | | | | | | | | | | | |
| 13 | Easy | Simplified rules | 0.80 | 0.20 | 0.20 | | 0.70 | 0.20 | 0.20 | 0.60 | 0.00 | 0.00 | |
| 14 | Normal | Presence rules | 0.91 | 0.60 | 0.00 | | 0.84 | 0.40 | 0.00 | 0.56 | 0.00 | 0.00 | |
| 15 | Challenge | Logical Rules | 0.89 | 0.40 | 0.00 | | 0.73 | 0.40 | 0.00 | 0.62 | 0.00 | 0.00 | |
| | Space Sick | Open-ended discovery | | | | | | | | | | | |
| 16 | Easy | Single instrument | 0.78 | 0.60 | 0.00 | | 0.68 | 0.40 | 0.10 | 0.80 | 1.00 | 0.60 | |
| 17 | Normal | Multiple instruments | 0.58 | 0.00 | 0.13 | | 0.45 | 0.00 | 0.13 | 0.16 | 0.00 | 0.33 | |
| 18 | Challenge | Novel instruments | 0.55 | 0.00 | 0.00 | | 0.26 | 0.00 | 0.00 | 0.20 | 0.00 | 0.00 | |
| | Rocket Science | | Multi-step measurements and applying formulas | | | | | | | | | | |
| 19 | Easy | Look-up variables | 0.33 | 0.00 | 0.00 | | 0.53 | 0.00 | 0.07 | 0.13 | 0.40 | 0.00 | |
| 20 | Normal | Measure 2 variables | 0.51 | 0.00 | 0.05 | | 0.34 | 0.00 | 0.00 | 0.11 | 0.00 | 0.00 | |
| 21 | Challenge | Measure 5 variables | 0.43 | 0.00 | 0.00 | | 0.15 | 0.00 | 0.00 | 0.22 | 0.00 | 0.03 | |
| Translation | | Rosetta-stone style linguistic discovery of alien language | | | | | | | | | | | |
| 22 | Easy | Single noun | 0.40 | 0.40 | 0.20 | | 0.30 | 0.00 | 0.00 | 0.20 | 0.20 | 0.00 | |
| 23 | Normal | Noun and verb | 0.20 | 0.00 | 0.00 | | 0.68 | 0.40 | 0.00 | 0.84 | 0.40 | 0.00 | |
| 24 | Challenge | Noun, adj., and verb | 0.49 | 0.00 | 0.00 | | 0.55 | 0.20 | 0.05 | 0.15 | 0.00 | 0.00 | |
| | Average (Easy) | | 0.59 | 0.38 | 0.25 | | 0.56 | 0.18 | 0.11 | 0.56 | 0.28 | 0.34 | |
| Average (Normal) | | | 0.63 | 0.18 | 0.14 | | 0.64 | 0.18 | 0.02 | 0.58 | 0.23 | 0.19 | |
| Average (Challenge) | | | 0.63 | 0.18 | 0.10 | | 0.50 | 0.15 | 0.01 | 0.49 | 0.08 | 0.08 | |
| | | | | | | | | | | | | | |
Table 5: Baseline model performance on each of the three scoring metrics *(task completion, task process, explanatory knowledge discovery)* across all 10 unit test tasks.Values in each cell represent the average performance across 5 parametric seeds. Unit tests tasks are run to a maximum of 100 steps.
| | | | ReACT | | Plan+Execute | Hypothesizer | | |
|----|----------------------------------------|-----------|------------|-----------|--------------|--------------|------------|--|
| # | Unit Test Topic | Procedure | Completion | Procedure | Completion | Procedure | Completion | |
| 25 | Multi-turn dialog with an agent | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
| 26 | Measure an object with an instrument | 0.87 | 0.60 | 0.73 | 0.40 | 1.00 | 1.00 | |
| 27 | Pick-and-place object | 0.90 | 0.80 | 0.80 | 0.60 | 1.00 | 1.00 | |
| 28 | Pick-and-give object | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
| 29 | Read DiscoveryFeed posts | 1.00 | 1.00 | 0.90 | 0.80 | 1.00 | 1.00 | |
| 30 | Move through doors | 0.58 | 0.20 | 0.25 | 0.00 | 0.30 | 0.00 | |
| 31 | Using keys with doors | 0.69 | 0.20 | 0.54 | 0.00 | 0.69 | 0.00 | |
| 32 | Navigate to a specific room in a house | 0.20 | 0.20 | 0.20 | 0.00 | 0.20 | 0.20 | |
| 33 | Search an environment for an object | 0.80 | 0.80 | 0.60 | 0.60 | 1.00 | 1.00 | |
| 34 | Interact with a moving agent | 0.60 | 0.20 | 0.53 | 0.00 | 0.53 | 0.20 | |
| | Average (Unit Tests) | | 0.60 | 0.66 | 0.44 | 0.77 | 0.64 | |
## 4.2 Baseline Agent Models
The baseline agents are described below, with model performance on Discovery tasks shown in Table 4, and performance on Unit Tests shown in Table 5. We use the GPT-4O model for all our agents due to its higher performance and lower cost compared to other models. For space we provide

View File

@ -1,3 +1,5 @@
Advocacy in Action 447
stakeholders has occurred in other nations, with groups and individuals refusing to risk being appropriated into the industry's public relations ambitions. It now looks like that with vigilance, tobacco control advocates can easily foment similar distaste in many areas of the business community. Our actions sought to denormalise the tobacco industry by disrupting its efforts to take its place alongside other industries—often with considerable social credit—in the hope that it might gain by association.
Tobacco industry posturing about its corporate responsibility can never hide the ugly consequences of its ongoing efforts to ''work with all relevant stakeholders for the preservation of opportunities for informed adults to consume tobacco products''1 (translation: ''we will build alliances with others who want to profit from tobacco use, to do all we can to counteract effective tobacco control''). BAT has 15.4% and Philip Morris 16.4% of the global cigarette market.6 With 4.9 million smokers currently dying from tobacco use each year, and the industry unblinkingly concurring that its products are addictive, this leaves BAT to argue why it should not be held to be largely accountable for the annual deaths of some 754 600 smokers, and Philip Morris some 803 600 smokers.
@ -18,7 +20,9 @@ Corporate social responsibility and the tobacco industry: hope or hype?
## N Hirschhorn
............................................................................................................................... Tobacco Control 2004;13:447453. doi: 10.1136/tc.2003.006676
...............................................................................................................................
Tobacco Control 2004;13:447453. doi: 10.1136/tc.2003.006676
Corporate social responsibility (CSR) emerged from a realisation among transnational corporations of the need to account for and redress their adverse impact on society: specifically, on human rights, labour practices, and the environment. Two transnational tobacco companies have recently adopted CSR: Philip Morris, and British American Tobacco. This report explains the origins and theory behind CSR; examines internal company documents from Philip Morris showing the company's deliberations on the matter, and the company's perspective on its own behaviour; and reflects on whether marketing tobacco is antithetical to social responsibility.
@ -28,11 +32,7 @@ Correspondence to: Dr Norbert Hirschhorn, Nastolantie 6, A3 00600 Helsinki, Finl
.......................
Received 13 November 2003 Accepted 15 July 2004 ....................... Over the past three decades increasing pressure from non-governmental organisations (NGOs), governments and the United Nations, has required transnational corporations (TNCs) to examine and redress the adverse impact their businesses have on society and the environment. Many have responded by taking up what is known as ''corporate social responsibility'' (CSR); only recently have two major cigarette companies followed suit: Philip Morris (PM) and British American Tobacco (BAT). This report first provides the context and development of CSR; then, from internal company documents, examines how PM came to its own version. This paper examines whether a
tobacco company espousing CSR should be judged simply as a corporate entity along
standards of business ethics, or as an irretrievably negative force in the realm of public health, thereby rendering CSR an oxymoron.
Received 13 November 2003 Accepted 15 July 2004 ....................... Over the past three decades increasing pressure from non-governmental organisations (NGOs), governments and the United Nations, has required transnational corporations (TNCs) to examine and redress the adverse impact their businesses have on society and the environment. Many have responded by taking up what is known as ''corporate social responsibility'' (CSR); only recently have two major cigarette companies followed suit: Philip Morris (PM) and British American Tobacco (BAT). This report first provides the context and development of CSR; then, from internal company documents, examines how PM came to its own version. This paper examines whether a tobacco company espousing CSR should be judged simply as a corporate entity along standards of business ethics, or as an irretrievably negative force in the realm of public health, thereby rendering CSR an oxymoron.
## CORPORATE SOCIAL RESPONSIBILITY: THE CONTEXT

File diff suppressed because one or more lines are too long