Some readmes and instructions

This commit is contained in:
Jake Poznanski 2025-02-19 13:25:31 -08:00
parent 4e0339f965
commit c3d0ce99f2
3 changed files with 115 additions and 3 deletions

View File

@ -52,5 +52,12 @@ Step 2. Run your extraction on it, point output to folder, ex. olmocr-v2_1/ wher
Step 3. Run the evaluation script
Step 4. Get results, and use tinyhost to view all failing examples
## TODO
### Running existing scripts
```bash
pip install marker-pdf==1.5.4
python olmocr/bench/runners/run_marker.py olmocr/bench/sample_data/pdfs
pip install verovio torchvision
python olmocr/bench/runners/run_gotocr.py olmocr/bench/sample_data/pdfs
```

View File

@ -27,14 +27,14 @@ def run(pdf_folder):
"""
Convert all PDF files in the specified folder to markdown using GOT-OCR.
Each page of a PDF is converted to an image and processed with OCR.
The markdown files are saved in a folder called "marker" located alongside the pdf_folder.
The markdown files are saved in a folder called "got_ocr" located alongside the pdf_folder.
:param pdf_folder: Path to the folder containing PDF files.
"""
# Resolve absolute paths and prepare destination folder
pdf_folder = os.path.abspath(pdf_folder)
parent_dir = os.path.dirname(pdf_folder)
destination_folder = os.path.join(parent_dir, "marker")
destination_folder = os.path.join(parent_dir, "got_ocr")
os.makedirs(destination_folder, exist_ok=True)
# List all PDF files in the folder

View File

@ -0,0 +1,105 @@
4.47
www.tobaccocontrol.com
Advocacy in Action
stakeholders has occurred in other nations, with groups and
individuals refusing to risk being appropriated into the
industrys public relations ambitions. It now looks like that
with vigilance, tobacco control advocates can easily foment
similar distaste in many areas of the business community.
Our actions sought to demolise the tobacco industry by
disrupting its efforts to take its place alongside other
publications, including the public and social credit—in the
hope that it might gain by association.
Tobacco industry posturing about its corporate responsi-
bility can never hide the ugly consequences of its ongoing
efforts to “work with all relevant stakeholders for the
environment” and the governments “environmental” and
tobacco products”1 (translation: “we will build alliances with
others who want to profit from tobacco use, to do all we can
to counteract effective tobacco control”). BAT has 15.4% and
Philip Morris 16.4% of the global cigarette market: “With 4.9
million smokers currently dying from tobacco use each year,
and the industry unblinkingly concurring that its products
are addictive, this leaves BAT to argue why it should not
be held to be largely accountable for the annual deaths of
some 754 600 smokers, and Philip Morris some 803 600
smokers.
REFERENCES
1
Bash Arrington Tobacco. Social Report. http://www.bash.com/20400g.
2
Tree B Tobacco.com copyright angers. MPs. The Age (Methanone) 2004, May
17 http://www.bango.com.au/articles/2004/05/16/
3
Hirschhorn. A Report. http://www.bango.com.au/articles/2004/05/16
4
Rishidson N. Corporate social responsibility and the tobacco industry. hope
or hypo? Tobacco Control 2004;13.44753.
5
Buch, Michael, and Michael Scherer. “Healthcare website. http://
www.ethalicorg.com/asia2004/.
6
Chopman S, Shatenstein S. Eterne corporate makeover tobacco companies,
and the industrys public relations ambitions. http://www.bishair.com/2004/
http://petition globalink.org/view.php/roades-entree-entree/.
7
6
Mockay J, Erikson M. The Tobacco Effects. Green: World Health
Organization, 2002.
INDUSTRY WATCH
Corporate social responsibility and the tobacco industry:
hope or hype?
N Hirschhorn
Corporate social responsibility (CSR) emerged from a
realisation among transnational corporations of the need
to account for and redress their adverse impact on society:
specifically, on human rights, labour practices, and the
environment. Two transnational tobacco companies have
recently adopted CSR: Philip Morris, and British American
Tobacco. This report explains the origins and theory
behind CSR; examines internal company documents from
Philip Morris showing the companys deliberations on the
matter, and the companys perspective on its own
behaviour; and reflects on whether marketing tobacco is
antithetical to social responsibility.
Correspondence to:
Dr Norbert Hirschhorn,
Nostalonte 6, A3 00600
Helsinki, Finland, 000000.
Received
13 November 2003
Accepted 15 July 2004
Tobacco Control 2004;13.447453. doi: 10.1136/rlc.2003.006676
tobacco company espousing CSR should be
judged simply as a corporate entity along
standards of business ethics, or as an irretrie-
vably negative force in the realm of public health,
thereby rendering CSR an oxymoron.
CORPORATE SOCIAL RESPONSIBILITY:
THE CONTEXT
The term “corporate social responsibility” is in
vogue at the moment bounds a concept it is vague
and means different groups to different people.4
The report from CSR trace its American roots
to the 19th century when large industries
engaged in philanthropy and established great
public institutions, a form of “noblesse object”.
But the notion that corporations should be
noted to the extent to the two society because of
their impact on society, and the environment
from the civil rights, peace, and environmental
movements of the last half century.2 The
unprecedented expansion of power and influ-
ence of TNCs over the past three decades has
accelerated global trade and development, but
also environmental damage and abuses of
the world.
Abbreviations: ASH, Action on Smoking and Health,
and Health, and Health, and Health, and Health
Environmentally Responsible Economies; CSR, corporate
social responsibility, DJSI, Dow Jones Sustainability Index;
CACA, Global Corporate Affairs Council; GRI, Global
Health, and Health, and Health, and Health, and
NGOs, non-governmental organisations; PM, Philip
Morris; TNCs, transnational corporations; UNEP, United
Nations Environment Program