mirror of
https://github.com/allenai/olmocr.git
synced 2025-09-27 09:27:55 +00:00
Some readmes and instructions
This commit is contained in:
parent
4e0339f965
commit
c3d0ce99f2
@ -52,5 +52,12 @@ Step 2. Run your extraction on it, point output to folder, ex. olmocr-v2_1/ wher
|
|||||||
Step 3. Run the evaluation script
|
Step 3. Run the evaluation script
|
||||||
Step 4. Get results, and use tinyhost to view all failing examples
|
Step 4. Get results, and use tinyhost to view all failing examples
|
||||||
|
|
||||||
## TODO
|
### Running existing scripts
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install marker-pdf==1.5.4
|
||||||
|
python olmocr/bench/runners/run_marker.py olmocr/bench/sample_data/pdfs
|
||||||
|
|
||||||
|
pip install verovio torchvision
|
||||||
|
python olmocr/bench/runners/run_gotocr.py olmocr/bench/sample_data/pdfs
|
||||||
|
```
|
||||||
|
@ -27,14 +27,14 @@ def run(pdf_folder):
|
|||||||
"""
|
"""
|
||||||
Convert all PDF files in the specified folder to markdown using GOT-OCR.
|
Convert all PDF files in the specified folder to markdown using GOT-OCR.
|
||||||
Each page of a PDF is converted to an image and processed with OCR.
|
Each page of a PDF is converted to an image and processed with OCR.
|
||||||
The markdown files are saved in a folder called "marker" located alongside the pdf_folder.
|
The markdown files are saved in a folder called "got_ocr" located alongside the pdf_folder.
|
||||||
|
|
||||||
:param pdf_folder: Path to the folder containing PDF files.
|
:param pdf_folder: Path to the folder containing PDF files.
|
||||||
"""
|
"""
|
||||||
# Resolve absolute paths and prepare destination folder
|
# Resolve absolute paths and prepare destination folder
|
||||||
pdf_folder = os.path.abspath(pdf_folder)
|
pdf_folder = os.path.abspath(pdf_folder)
|
||||||
parent_dir = os.path.dirname(pdf_folder)
|
parent_dir = os.path.dirname(pdf_folder)
|
||||||
destination_folder = os.path.join(parent_dir, "marker")
|
destination_folder = os.path.join(parent_dir, "got_ocr")
|
||||||
os.makedirs(destination_folder, exist_ok=True)
|
os.makedirs(destination_folder, exist_ok=True)
|
||||||
|
|
||||||
# List all PDF files in the folder
|
# List all PDF files in the folder
|
||||||
|
105
olmocr/bench/sample_data/got_ocr/multi_column_miss.md
Normal file
105
olmocr/bench/sample_data/got_ocr/multi_column_miss.md
Normal file
@ -0,0 +1,105 @@
|
|||||||
|
4.47
|
||||||
|
www.tobaccocontrol.com
|
||||||
|
Advocacy in Action
|
||||||
|
stakeholders has occurred in other nations, with groups and
|
||||||
|
individuals refusing to risk being appropriated into the
|
||||||
|
industry’s public relations ambitions. It now looks like that
|
||||||
|
with vigilance, tobacco control advocates can easily foment
|
||||||
|
similar distaste in many areas of the business community.
|
||||||
|
Our actions sought to demolise the tobacco industry by
|
||||||
|
disrupting its efforts to take its place alongside other
|
||||||
|
publications, including the public and social credit—in the
|
||||||
|
hope that it might gain by association.
|
||||||
|
Tobacco industry posturing about its corporate responsi-
|
||||||
|
bility can never hide the ugly consequences of its ongoing
|
||||||
|
efforts to “work with all relevant stakeholders for the
|
||||||
|
environment” and the government’s “environmental” and
|
||||||
|
tobacco products”1 (translation: “we will build alliances with
|
||||||
|
others who want to profit from tobacco use, to do all we can
|
||||||
|
to counteract effective tobacco control”). BAT has 15.4% and
|
||||||
|
Philip Morris 16.4% of the global cigarette market: “With 4.9
|
||||||
|
million smokers currently dying from tobacco use each year,
|
||||||
|
and the industry unblinkingly concurring that its products
|
||||||
|
are addictive, this leaves BAT to argue why it should not
|
||||||
|
be held to be largely accountable for the annual deaths of
|
||||||
|
some 754 600 smokers, and Philip Morris some 803 600
|
||||||
|
smokers.
|
||||||
|
REFERENCES
|
||||||
|
1
|
||||||
|
Bash Arrington Tobacco. Social Report. http://www.bash.com/20400g.
|
||||||
|
2
|
||||||
|
Tree B Tobacco.com copyright angers. MPs. The Age (Methanone) 2004, May
|
||||||
|
17 http://www.bango.com.au/articles/2004/05/16/
|
||||||
|
3
|
||||||
|
Hirschhorn. A Report. http://www.bango.com.au/articles/2004/05/16
|
||||||
|
4
|
||||||
|
Rishidson N. Corporate social responsibility and the tobacco industry. hope
|
||||||
|
or hypo? Tobacco Control 2004;13.447–53.
|
||||||
|
5
|
||||||
|
Buch, Michael, and Michael Scherer. “Healthcare website. http://
|
||||||
|
www.ethalicorg.com/asia2004/.
|
||||||
|
6
|
||||||
|
Chopman S, Shatenstein S. Eterne corporate makeover tobacco companies,
|
||||||
|
and the industry’s public relations ambitions. http://www.bishair.com/2004/
|
||||||
|
http://petition globalink.org/view.php/roades-entree-entree/.
|
||||||
|
7
|
||||||
|
6
|
||||||
|
Mockay J, Erikson M. The Tobacco Effects. Green: World Health
|
||||||
|
Organization, 2002.
|
||||||
|
INDUSTRY WATCH
|
||||||
|
Corporate social responsibility and the tobacco industry:
|
||||||
|
hope or hype?
|
||||||
|
N Hirschhorn
|
||||||
|
Corporate social responsibility (CSR) emerged from a
|
||||||
|
realisation among transnational corporations of the need
|
||||||
|
to account for and redress their adverse impact on society:
|
||||||
|
specifically, on human rights, labour practices, and the
|
||||||
|
environment. Two transnational tobacco companies have
|
||||||
|
recently adopted CSR: Philip Morris, and British American
|
||||||
|
Tobacco. This report explains the origins and theory
|
||||||
|
behind CSR; examines internal company documents from
|
||||||
|
Philip Morris showing the company’s deliberations on the
|
||||||
|
matter, and the company’s perspective on its own
|
||||||
|
behaviour; and reflects on whether marketing tobacco is
|
||||||
|
antithetical to social responsibility.
|
||||||
|
Correspondence to:
|
||||||
|
Dr Norbert Hirschhorn,
|
||||||
|
Nostalonte 6, A3 00600
|
||||||
|
Helsinki, Finland, 000000.
|
||||||
|
Received
|
||||||
|
13 November 2003
|
||||||
|
Accepted 15 July 2004
|
||||||
|
Tobacco Control 2004;13.447–453. doi: 10.1136/rlc.2003.006676
|
||||||
|
tobacco company espousing CSR should be
|
||||||
|
judged simply as a corporate entity along
|
||||||
|
standards of business ethics, or as an irretrie-
|
||||||
|
vably negative force in the realm of public health,
|
||||||
|
thereby rendering CSR an oxymoron.
|
||||||
|
CORPORATE SOCIAL RESPONSIBILITY:
|
||||||
|
THE CONTEXT
|
||||||
|
The term “corporate social responsibility” is in
|
||||||
|
vogue at the moment bounds a concept it is vague
|
||||||
|
and means different groups to different people.4
|
||||||
|
The report from CSR trace its American roots
|
||||||
|
to the 19th century when large industries
|
||||||
|
engaged in philanthropy and established great
|
||||||
|
public institutions, a form of “noblesse object”.
|
||||||
|
But the notion that corporations should be
|
||||||
|
noted to the extent to the two society because of
|
||||||
|
their impact on society, and the environment
|
||||||
|
from the civil rights, peace, and environmental
|
||||||
|
movements of the last half century.2 The
|
||||||
|
unprecedented expansion of power and influ-
|
||||||
|
ence of TNCs over the past three decades has
|
||||||
|
accelerated global trade and development, but
|
||||||
|
also environmental damage and abuses of
|
||||||
|
the world.
|
||||||
|
Abbreviations: ASH, Action on Smoking and Health,
|
||||||
|
and Health, and Health, and Health, and Health
|
||||||
|
Environmentally Responsible Economies; CSR, corporate
|
||||||
|
social responsibility, DJSI, Dow Jones Sustainability Index;
|
||||||
|
CACA, Global Corporate Affairs Council; GRI, Global
|
||||||
|
Health, and Health, and Health, and Health, and
|
||||||
|
NGOs, non-governmental organisations; PM, Philip
|
||||||
|
Morris; TNCs, transnational corporations; UNEP, United
|
||||||
|
Nations Environment Program
|
Loading…
x
Reference in New Issue
Block a user