mirror of
https://github.com/allenai/olmocr.git
synced 2025-09-26 08:54:01 +00:00
Some readmes and instructions
This commit is contained in:
parent
4e0339f965
commit
c3d0ce99f2
@ -52,5 +52,12 @@ Step 2. Run your extraction on it, point output to folder, ex. olmocr-v2_1/ wher
|
||||
Step 3. Run the evaluation script
|
||||
Step 4. Get results, and use tinyhost to view all failing examples
|
||||
|
||||
## TODO
|
||||
### Running existing scripts
|
||||
|
||||
```bash
|
||||
pip install marker-pdf==1.5.4
|
||||
python olmocr/bench/runners/run_marker.py olmocr/bench/sample_data/pdfs
|
||||
|
||||
pip install verovio torchvision
|
||||
python olmocr/bench/runners/run_gotocr.py olmocr/bench/sample_data/pdfs
|
||||
```
|
||||
|
@ -27,14 +27,14 @@ def run(pdf_folder):
|
||||
"""
|
||||
Convert all PDF files in the specified folder to markdown using GOT-OCR.
|
||||
Each page of a PDF is converted to an image and processed with OCR.
|
||||
The markdown files are saved in a folder called "marker" located alongside the pdf_folder.
|
||||
The markdown files are saved in a folder called "got_ocr" located alongside the pdf_folder.
|
||||
|
||||
:param pdf_folder: Path to the folder containing PDF files.
|
||||
"""
|
||||
# Resolve absolute paths and prepare destination folder
|
||||
pdf_folder = os.path.abspath(pdf_folder)
|
||||
parent_dir = os.path.dirname(pdf_folder)
|
||||
destination_folder = os.path.join(parent_dir, "marker")
|
||||
destination_folder = os.path.join(parent_dir, "got_ocr")
|
||||
os.makedirs(destination_folder, exist_ok=True)
|
||||
|
||||
# List all PDF files in the folder
|
||||
|
105
olmocr/bench/sample_data/got_ocr/multi_column_miss.md
Normal file
105
olmocr/bench/sample_data/got_ocr/multi_column_miss.md
Normal file
@ -0,0 +1,105 @@
|
||||
4.47
|
||||
www.tobaccocontrol.com
|
||||
Advocacy in Action
|
||||
stakeholders has occurred in other nations, with groups and
|
||||
individuals refusing to risk being appropriated into the
|
||||
industry’s public relations ambitions. It now looks like that
|
||||
with vigilance, tobacco control advocates can easily foment
|
||||
similar distaste in many areas of the business community.
|
||||
Our actions sought to demolise the tobacco industry by
|
||||
disrupting its efforts to take its place alongside other
|
||||
publications, including the public and social credit—in the
|
||||
hope that it might gain by association.
|
||||
Tobacco industry posturing about its corporate responsi-
|
||||
bility can never hide the ugly consequences of its ongoing
|
||||
efforts to “work with all relevant stakeholders for the
|
||||
environment” and the government’s “environmental” and
|
||||
tobacco products”1 (translation: “we will build alliances with
|
||||
others who want to profit from tobacco use, to do all we can
|
||||
to counteract effective tobacco control”). BAT has 15.4% and
|
||||
Philip Morris 16.4% of the global cigarette market: “With 4.9
|
||||
million smokers currently dying from tobacco use each year,
|
||||
and the industry unblinkingly concurring that its products
|
||||
are addictive, this leaves BAT to argue why it should not
|
||||
be held to be largely accountable for the annual deaths of
|
||||
some 754 600 smokers, and Philip Morris some 803 600
|
||||
smokers.
|
||||
REFERENCES
|
||||
1
|
||||
Bash Arrington Tobacco. Social Report. http://www.bash.com/20400g.
|
||||
2
|
||||
Tree B Tobacco.com copyright angers. MPs. The Age (Methanone) 2004, May
|
||||
17 http://www.bango.com.au/articles/2004/05/16/
|
||||
3
|
||||
Hirschhorn. A Report. http://www.bango.com.au/articles/2004/05/16
|
||||
4
|
||||
Rishidson N. Corporate social responsibility and the tobacco industry. hope
|
||||
or hypo? Tobacco Control 2004;13.447–53.
|
||||
5
|
||||
Buch, Michael, and Michael Scherer. “Healthcare website. http://
|
||||
www.ethalicorg.com/asia2004/.
|
||||
6
|
||||
Chopman S, Shatenstein S. Eterne corporate makeover tobacco companies,
|
||||
and the industry’s public relations ambitions. http://www.bishair.com/2004/
|
||||
http://petition globalink.org/view.php/roades-entree-entree/.
|
||||
7
|
||||
6
|
||||
Mockay J, Erikson M. The Tobacco Effects. Green: World Health
|
||||
Organization, 2002.
|
||||
INDUSTRY WATCH
|
||||
Corporate social responsibility and the tobacco industry:
|
||||
hope or hype?
|
||||
N Hirschhorn
|
||||
Corporate social responsibility (CSR) emerged from a
|
||||
realisation among transnational corporations of the need
|
||||
to account for and redress their adverse impact on society:
|
||||
specifically, on human rights, labour practices, and the
|
||||
environment. Two transnational tobacco companies have
|
||||
recently adopted CSR: Philip Morris, and British American
|
||||
Tobacco. This report explains the origins and theory
|
||||
behind CSR; examines internal company documents from
|
||||
Philip Morris showing the company’s deliberations on the
|
||||
matter, and the company’s perspective on its own
|
||||
behaviour; and reflects on whether marketing tobacco is
|
||||
antithetical to social responsibility.
|
||||
Correspondence to:
|
||||
Dr Norbert Hirschhorn,
|
||||
Nostalonte 6, A3 00600
|
||||
Helsinki, Finland, 000000.
|
||||
Received
|
||||
13 November 2003
|
||||
Accepted 15 July 2004
|
||||
Tobacco Control 2004;13.447–453. doi: 10.1136/rlc.2003.006676
|
||||
tobacco company espousing CSR should be
|
||||
judged simply as a corporate entity along
|
||||
standards of business ethics, or as an irretrie-
|
||||
vably negative force in the realm of public health,
|
||||
thereby rendering CSR an oxymoron.
|
||||
CORPORATE SOCIAL RESPONSIBILITY:
|
||||
THE CONTEXT
|
||||
The term “corporate social responsibility” is in
|
||||
vogue at the moment bounds a concept it is vague
|
||||
and means different groups to different people.4
|
||||
The report from CSR trace its American roots
|
||||
to the 19th century when large industries
|
||||
engaged in philanthropy and established great
|
||||
public institutions, a form of “noblesse object”.
|
||||
But the notion that corporations should be
|
||||
noted to the extent to the two society because of
|
||||
their impact on society, and the environment
|
||||
from the civil rights, peace, and environmental
|
||||
movements of the last half century.2 The
|
||||
unprecedented expansion of power and influ-
|
||||
ence of TNCs over the past three decades has
|
||||
accelerated global trade and development, but
|
||||
also environmental damage and abuses of
|
||||
the world.
|
||||
Abbreviations: ASH, Action on Smoking and Health,
|
||||
and Health, and Health, and Health, and Health
|
||||
Environmentally Responsible Economies; CSR, corporate
|
||||
social responsibility, DJSI, Dow Jones Sustainability Index;
|
||||
CACA, Global Corporate Affairs Council; GRI, Global
|
||||
Health, and Health, and Health, and Health, and
|
||||
NGOs, non-governmental organisations; PM, Philip
|
||||
Morris; TNCs, transnational corporations; UNEP, United
|
||||
Nations Environment Program
|
Loading…
x
Reference in New Issue
Block a user