diff --git a/CHANGELOG.md b/CHANGELOG.md index 591fc2a3c..66ef3016b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,5 @@ +## 0.10.30-dev0 + ## 0.10.29 ### Enhancements diff --git a/example-docs/test_evaluate_files/gold_standard_cct/Bank Good Credit Loan.pptx.txt b/example-docs/test_evaluate_files/gold_standard_cct/Bank Good Credit Loan.pptx.txt new file mode 100644 index 000000000..7c2759db8 --- /dev/null +++ b/example-docs/test_evaluate_files/gold_standard_cct/Bank Good Credit Loan.pptx.txt @@ -0,0 +1,36 @@ +Bank Good Credit +Accredited with IABACTM +(International Association of Business Analytics Certifications) +IABAC International Association of +Business Analytics Certification +DataMitesTM. All Right Reserved + +Objective & Background +Classify credit card customers as good / bad, based on information from internal and external sources. +Data provided +Demographic: Base file of with credit card history details. Only one record for every customer. +Account: Contians data for various loans availed by the customer. Not related to credit card. Multiple records for every customer. +Enquiries: Enquired made by customers for different loan purposes. Multiple records for every customer. +DataMitesTM. All Right Reserved + +Design +Data to be downloaded using SQL queries. +Required information to be extracted from Account and Enquiry files and converted to one-to-one files. +The columns from the two files should be merged with Demographic file using Left Join with customer no as key column, to create a final file. The final file should contain all the records in demographic and additional columns/features from Account and Enquiry files will get added to Demographic file. +There will be many customers in account and enquiry file who will get left out. This is fine as we anyway dont know their good/bad label for training purpose. +DataMitesTM. All Right Reserved + +Analysis of Data +Show using Excel File +DataMitesTM. All Right Reserved + +Explain Coding / outcomes +Show using Jupyter +DataMitesTM. All Right Reserved + + +Thank You +DataMitesTM. All Right Reserved + + + diff --git a/example-docs/test_evaluate_files/gold_standard_cct/Performance-Audit-Discussion.pdf.txt b/example-docs/test_evaluate_files/gold_standard_cct/Performance-Audit-Discussion.pdf.txt new file mode 100644 index 000000000..379cfd378 --- /dev/null +++ b/example-docs/test_evaluate_files/gold_standard_cct/Performance-Audit-Discussion.pdf.txt @@ -0,0 +1,205 @@ +Page 1 +The introductory chapter of Government Auditing Standards (GAGAS)1 +outlines five concepts describing how public officials are to provide +functions and services: effectively, efficiently, economically, ethically, and +equitably. When planning, gathering and assessing evidence, and +reporting audit results, auditors may focus on one or more of these +concepts. The following discussion is intended to assist auditors when +developing audit objectives for performance audits of government +programs and activities.2 +This discussion is designed to help auditors understand +and apply the concepts cited above for performance audits +conducted in accordance with GAGAS. This discussion +does not contain requirements, does not amend GAGAS, +and is not considered interpretive guidance, as defined in +chapter 2 of GAGAS. +Paragraph 1.02: +The concept of accountability for use of public resources +and government authority is key to our nation’s governing +processes. Management and officials entrusted with public +resources are responsible for carrying out public functions +and providing service to the public effectively, efficiently, +economically, ethically, and equitably within the context +of the statutory boundaries of the specific government +program. [Emphasis added.] +Paragraph 1.03: +As reflected in applicable laws, regulations, agreements, +and standards, management and officials of government +programs are responsible for providing reliable, useful, and +timely information for transparency and accountability of +these programs and their operations. Legislators, oversight + 1GAO, Government Auditing Standards: 2018 Revision, GAO-21-368G (Washington, +D.C.: April 2021) +2The concepts cited may also be applicable to other GAGAS engagements, based on the +auditors’ judgments. This discussion is limited to considering these concepts in +performance audits. +GAGAS Performance Audits: Discussion of +Concepts to Consider When Auditing Public +Functions and Services +GAGAS Paragraphs +Page 2 +bodies, those charged with governance, and the public +need to know whether (1) management and officials +manage government resources and use their authority +properly and in compliance with laws and regulations; (2) +government programs are achieving their objectives and +desired outcomes; and (3) government services are +provided effectively, efficiently, economically, ethically, +and equitably. [Emphasis added.] +Government administration best serves the collective interest of the public +when it is effective, efficient, economical, ethical, and equitable. Auditors +help inform legislators, oversight bodies, those charged with governance, +and the public about whether public services are being provided +consistent with these concepts. Government auditing can contribute to +accountability and can help improve government administration by +identifying deficiencies and recommending enhancements to achieve +effective, efficient, economical, ethical, and equitable outcomes, when +appropriate within the context of the audit objectives. As such, it is +important for auditors to understand the concepts below as they relate to +administering government programs or activities and how they can +assess or address these expectations of government performance in +conducting their performance audits. +The examples that follow the discussion of each concept illustrate the +distinctions between these concepts. In a performance audit, it is +common practice to incorporate more than one of these concepts when +conducting the audit. +The administration of a government program or activity is effective when +it achieves the intended results. A performance audit that focuses on the +effectiveness of a program or activity seeks to establish a cause-andeffect relationship between the operation of the program or activity and +achieving its stated objectives. Achieving the objectives does not +guarantee that the program or activity was effective unless the auditors +can establish that the program or activity caused, or contributed to, the +desired outcome. +Example: In a performance audit examining how effective a +housing voucher program was in achieving its goal of improving +economic outcomes for recipients, auditors may determine +whether receiving housing vouchers led to better subsequent +economic outcomes for recipients than those of similarly situated +individuals who did not receive vouchers. +Discussion +Effective +Page 3 +Example: In a performance audit assessing the effectiveness of +an after-school program targeted at helping students improve their +reading proficiency, auditors may examine the extent to which +participants’ reading levels improved relative to baseline data from +before they joined the program. +The administration of a government program or activity is efficient when +it gets the most value from available resources. When a performance +audit focuses on efficiency, auditors examine whether the resources used +to administer a program or activity have been put to optimal or +satisfactory use, or whether the same or similar results could have been +achieved more timely or with fewer resources. +Example: In a performance audit assessing a disaster relief +agency’s mobilization of resources to respond to a disaster, +auditors may assess the disaster relief agency’s timeliness in +providing relief compared to its own previous performance or the +performance of other similarly situated agencies that have +responded to comparable disasters. +Example: In a performance audit assessing a consumer protection +agency’s response to consumer complaints, auditors may assess +whether the agency’s efforts to streamline its processes resulted +in improved timely resolution of complaints. +Example: In a performance audit assessing the time a state needs +to process unemployment benefits targeted at helping those in +need, auditors may assess how long the process takes from +receipt of the unemployment application to the applicant’s receipt +of the benefit, including steps such as verifying required +information. +The administration of a government program or activity is economical +when it minimizes the costs of resources used in performing its functions +while meeting timeliness and quality considerations for those resources. +When auditing economy, auditors primarily focus on the costs of inputs +rather than on the outcomes achieved. +Example: In a performance audit examining an agency’s +international travel expenses, in addition to assessing the design +of internal controls and compliance with expense guidelines, +auditors may test whether, for a sample of trips, bookings of +Efficient +Economical +Page 4 +equivalent airline tickets and hotel rooms could be found at a +lower cost. +Example: In a performance audit assessing an agency’s +acquisition practices, auditors may examine whether the agency’s +decisions regarding purchasing, leasing, or reimbursing +employees for the costs of acquiring various supplies or +equipment achieved the lowest cost while meeting applicable +requirements. +The administration of a government program or activity is ethical when it +advances the collective interest of the public rather than private gain and +is conducted with honesty, integrity, and impartiality. Laws and +regulations often specify rules of ethical conduct. Therefore, audits +examining the ethical administration of a program or activity may involve +assessing compliance with such laws and regulations. Fraud in +administering a government program or activity betrays the public trust +and is, by definition, unethical. In addition, auditors may identify instances +of unethical conduct that result in waste and abuse during testing of +internal controls as part of a performance audit. +Example: In a performance audit assessing agency officials’ +compliance with conflict-of-interest requirements, auditors may +compare a sample of financial disclosure reports filed against +requirements in statute or regulation. +Example: In a performance audit assessing potential regulatory +capture related to a particular industry, auditors may assess the +extent to which the regulatory agency has sufficient controls to +reasonably assure its employees’ independence from the entities +subject to the agency’s regulation. +Example: In a performance audit assessing an office’s policies +and procedures for purchase cards, auditors’ testing of the +program’s controls to identify deficiencies may identify fraud, +waste, or abuse in its administration. +The administration of a government program or activity is equitable when +it consistently serves members of the public, distributes public services, +and implements public policy in a manner that promotes fairness, justice, +and equality. Auditing whether the administration of a government +program or activity is equitable may include assessing the +Ethical +Equitable +Page 5 +• equality of access to and provision of services; +• procedural fairness and equal treatment of individuals in +government programs and policies; +• causes of disparate outcomes; +• or distributional impacts of public policies, programs, resources, +and services. +Disaggregating data by social groups or communities that share a +particular characteristic (e.g., gender, race, ethnicity, age, or income) +can help illuminate differences. Reporting on such differences, when +appropriate within the context of the audit objectives, can increase +understanding of the effects of policies and programs on issues of +equity. +Example: In a performance audit assessing the granting of +waivers from particular requirements, auditors may use +disaggregated data about waiver recipients to assess whether +different groups or communities were treated fairly and equally in +the process. +Example: In a performance audit assessing a grant program +aimed at expanding internet access, auditors may assess the +extent to which formulas, criteria, or other factors (such as +matching funds or capital requirements) considered in the +distribution of grant funds may be to the specific advantage or +disadvantage of certain groups, regions, or communities, thereby +causing inequities. +Example: In a performance audit assessing scholarship outcomes +in higher education programs, auditors may report on the +distribution of scholarships by race, gender identity, and income to +illuminate potential disparities among scholarship recipients. +These concepts may overlap. For example, efficiency may also be a +component of effectiveness. Similarly, when appropriate within the +context of the program and audit objectives, auditors may disaggregate +the results of performance audits that focus on efficiency or effectiveness +Page 6 +issues to illuminate inequities in program administration or in distribution +of public services. +While all of these concepts are important to administering government +programs responsibly, it is up to the professional judgment of the auditors +to determine the specific concepts that are relevant in conducting the +performance audit and reporting the results. Auditors’ professional +judgments are informed by, among other things, the needs of the users of +the audit reports; the nature, context, and objectives of the program or +activity under audit; and the public interest. +To view the current Yellow Book, visit https://www.gao.gov/yellowbook. +For technical assistance, call (202) 512-9535 or email +yellowbook@gao.gov. +For More Information \ No newline at end of file diff --git a/example-docs/test_evaluate_files/gold_standard_cct/currency.csv.txt b/example-docs/test_evaluate_files/gold_standard_cct/currency.csv.txt new file mode 100644 index 000000000..de78bbd27 --- /dev/null +++ b/example-docs/test_evaluate_files/gold_standard_cct/currency.csv.txt @@ -0,0 +1,164 @@ +Code Symbol Name +AED . United Arab Emirates d +AFN Afghan afghani +ALL L Albanian lek +AMD AMD Armenian dram +ANG Netherlands Antillean gu +AOA Kz Angolan kwanza +ARS $ Argentine peso +AUD $ Australian dollar +AWG Afl. Aruban florin +AZN AZN Azerbaijani manat +BAM KM Bosnia and Herzegovina +BBD $ Barbadian dollar +BDT Bangladeshi taka +BGN . Bulgarian lev +BHD .. Bahraini dinar +BIF Fr Burundian franc +BMD $ Bermudian dollar +BND $ Brunei dollar +BOB Bs. Bolivian boliviano +BRL R$ Brazilian real +BSD $ Bahamian dollar +BTC Bitcoin +BTN Nu. Bhutanese ngultrum +BWP P Botswana pula +BYR Br Belarusian ruble (old)' +BYN Br Belarusian ruble +BZD $ Belize dollar +CAD $ Canadian dollar +CDF Fr Congolese franc +CHF CHF Swiss franc +CLP $ Chilean peso +CNY Chinese yuan +COP $ Colombian peso +CRC Costa Rican coln +CUC $ Cuban convertible peso') +CUP $ Cuban peso +CVE $ Cape Verdean escudo +CZK K Czech koruna +DJF Fr Djiboutian franc +DKK DKK Danish krone +DOP RD$ Dominican peso +DZD . Algerian dinar +EGP EGP Egyptian pound +ERN Nfk Eritrean nakfa +ETB Br Ethiopian birr +EUR Euro +FJD $ Fijian dollar +FKP Falkland Islands pound') +GBP Pound sterling +GEL Georgian lari +GGP Guernsey pound +GHS Ghana cedi +GIP Gibraltar pound +GMD D Gambian dalasi +GNF Fr Guinean franc +GTQ Q Guatemalan quetzal +GYD $ Guyanese dollar +HKD $ Hong Kong dollar +HNL L Honduran lempira +HRK kn Croatian kuna +HTG G Haitian gourde +HUF Ft Hungarian forint +IDR Rp Indonesian rupiah +ILS Israeli new shekel +IMP Manx pound +INR Indian rupee +IQD . Iraqi dinar +IRR Iranian rial +IRT Iranian toman +ISK kr. Icelandic krna +JEP Jersey pound +JMD $ Jamaican dollar +JOD . Jordanian dinar +JPY Japanese yen +KES KSh Kenyan shilling +KGS Kyrgyzstani som +KHR Cambodian riel +KMF Fr Comorian franc +KPW North Korean won +KRW South Korean won +KWD . Kuwaiti dinar +KYD $ Cayman Islands dollar +KZT Kazakhstani tenge +LAK Lao kip +LBP . Lebanese pound +LKR Sri Lankan rupee +LRD $ Liberian dollar +LSL L Lesotho loti +LYD . Libyan dinar +MAD .. Moroccan dirham +MDL MDL Moldovan leu +MGA Ar Malagasy ariary +MKD Macedonian denar +MMK Ks Burmese kyat +MNT Mongolian tgrg +MOP P Macanese pataca +MRU UM Mauritanian ouguiya +MUR Mauritian rupee +MVR . Maldivian rufiyaa +MWK MK Malawian kwacha +MXN $ Mexican peso +MYR RM Malaysian ringgit +MZN MT Mozambican metical +NAD N$ Namibian dollar +NGN Nigerian naira +NIO C$ Nicaraguan crdoba +NOK kr Norwegian krone +NPR Nepalese rupee +NZD $ New Zealand dollar +OMR .. Omani rial +PAB B/. Panamanian balboa +PEN S/ Sol +PGK K Papua New Guinean kina') +PHP Philippine peso +PKR Pakistani rupee +PLN z Polish zoty +PRB . Transnistrian ruble +PYG Paraguayan guaran +QAR . Qatari riyal +RON lei Romanian leu +RSD Serbian dinar +RUB Russian ruble +RWF Fr Rwandan franc +SAR . Saudi riyal +SBD $ Solomon Islands dollar') +SCR Seychellois rupee +SDG .. Sudanese pound +SEK kr Swedish krona +SGD $ Singapore dollar +SHP Saint Helena pound +SLL Le Sierra Leonean leone +SOS Sh Somali shilling +SRD $ Surinamese dollar +SSP South Sudanese pound +STN Db So Tom and Prncipe d +SYP . Syrian pound +SZL L Swazi lilangeni +THB Thai baht +TJS Tajikistani somoni +TMT m Turkmenistan manat +TND . Tunisian dinar +TOP T$ Tongan paanga +TRY Turkish lira +TTD $ Trinidad and Tobago doll +TWD NT$ New Taiwan dollar +TZS Sh Tanzanian shilling +UAH Ukrainian hryvnia +UGX UGX Ugandan shilling +USD $ United States (US) dolla +UYU $ Uruguayan peso +UZS UZS Uzbekistani som +VEF Bs F Venezuelan bolvar +VES Bs.S Bolvar soberano +VND Vietnamese ng +VUV Vt Vanuatu vatu +WST T Samoan tl +XAF CFA Central African CFA fr +XCD $ East Caribbean dollar +XOF CFA West African CFA franc +XPF Fr CFP franc +YER Yemeni rial +ZAR R South African rand +ZMW ZK Zambian kwacha \ No newline at end of file diff --git a/example-docs/test_evaluate_files/unstructured_output/Bank Good Credit Loan.pptx.json b/example-docs/test_evaluate_files/unstructured_output/Bank Good Credit Loan.pptx.json new file mode 100644 index 000000000..1c3771a2b --- /dev/null +++ b/example-docs/test_evaluate_files/unstructured_output/Bank Good Credit Loan.pptx.json @@ -0,0 +1,420 @@ +[ + { + "type": "Title", + "element_id": "0405351ac64213c7b1e40e31aff7d21b", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "category_depth": 1, + "languages": [ + "eng" + ], + "page_number": 1 + }, + "text": "Bank Good Credit " + }, + { + "type": "NarrativeText", + "element_id": "214987ebee9fd615365185fb3d692253", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "0405351ac64213c7b1e40e31aff7d21b", + "category_depth": 0, + "languages": [ + "eng" + ], + "page_number": 1 + }, + "text": "Accredited with IABAC\u2122" + }, + { + "type": "Title", + "element_id": "fc3d53b1d173c5c72205914ea331b052", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "category_depth": 1, + "languages": [ + "eng" + ], + "page_number": 1 + }, + "text": "( International Association of Business Analytics Certifications)`" + }, + { + "type": "Title", + "element_id": "b952b3e6d0e34020f1f48b5d9243d0a4", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "category_depth": 1, + "languages": [ + "eng" + ], + "page_number": 1 + }, + "text": "\u00a9 DataMites\u2122. All Rights Reserved | www.datamites.com" + }, + { + "type": "Title", + "element_id": "2dc308bd8d3a5c745dfacc3bdccd81db", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "category_depth": 0, + "languages": [ + "eng" + ], + "page_number": 2 + }, + "text": "Objective & Background" + }, + { + "type": "ListItem", + "element_id": "5a0a7e2a14285297ff3752656cb6df44", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "2dc308bd8d3a5c745dfacc3bdccd81db", + "category_depth": 1, + "languages": [ + "eng" + ], + "page_number": 2 + }, + "text": "Classify credit card customers as good / bad, based on information from internal and external sources. " + }, + { + "type": "ListItem", + "element_id": "5eb6ec96e6a3493c1ae56747ae457b7f", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "2dc308bd8d3a5c745dfacc3bdccd81db", + "category_depth": 1, + "languages": [ + "eng" + ], + "page_number": 2 + }, + "text": "Data provided" + }, + { + "type": "ListItem", + "element_id": "adec2b6c75369165b1d87dccdfd2dab8", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "5eb6ec96e6a3493c1ae56747ae457b7f", + "category_depth": 2, + "languages": [ + "eng" + ], + "page_number": 2 + }, + "text": "Demographic: Base file of with credit card history details. Only one record for every customer." + }, + { + "type": "ListItem", + "element_id": "c26d6dc6982b7f42045f4ffee951f8e0", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "5eb6ec96e6a3493c1ae56747ae457b7f", + "category_depth": 2, + "languages": [ + "eng" + ], + "page_number": 2 + }, + "text": "Account: Contians data for various loans availed by the customer. Not related to credit card. Multiple records for every customer." + }, + { + "type": "ListItem", + "element_id": "be3fc5cb3da83e6c22d1906330ee9f96", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "5eb6ec96e6a3493c1ae56747ae457b7f", + "category_depth": 2, + "languages": [ + "eng" + ], + "page_number": 2 + }, + "text": "Enquiries: Enquired made by customers for different loan purposes. Multiple records for every customer.\t" + }, + { + "type": "Title", + "element_id": "b952b3e6d0e34020f1f48b5d9243d0a4", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "2dc308bd8d3a5c745dfacc3bdccd81db", + "category_depth": 1, + "languages": [ + "eng" + ], + "page_number": 2 + }, + "text": "\u00a9 DataMites\u2122. All Rights Reserved | www.datamites.com" + }, + { + "type": "Title", + "element_id": "0072e6b934945d5ba08f9729e0084739", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "category_depth": 0, + "languages": [ + "eng" + ], + "page_number": 3 + }, + "text": "Design" + }, + { + "type": "ListItem", + "element_id": "af14c0ecaaa7ac1d2bca5cdfbcc32ec7", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "0072e6b934945d5ba08f9729e0084739", + "category_depth": 0, + "languages": [ + "eng" + ], + "page_number": 3 + }, + "text": "Data to be downloaded using SQL queries." + }, + { + "type": "ListItem", + "element_id": "51d8e67259ab8a11d2fdfc5cb9bcf45e", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "0072e6b934945d5ba08f9729e0084739", + "category_depth": 0, + "languages": [ + "eng" + ], + "page_number": 3 + }, + "text": "Required information to be extracted from Account and Enquiry files and converted to one-to-one files." + }, + { + "type": "ListItem", + "element_id": "6165e3bc219556f6ec397adc7240386b", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "0072e6b934945d5ba08f9729e0084739", + "category_depth": 0, + "languages": [ + "eng" + ], + "page_number": 3 + }, + "text": "The columns from the two files should be merged with Demographic file using Left Join with \u201ccustomer no\u201d as key column, to create a final file. The final file should contain all the records in demographic and additional columns/features from Account and Enquiry files will get added to Demographic file." + }, + { + "type": "ListItem", + "element_id": "31930936fc3bad2175b05e324e9923e4", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "0072e6b934945d5ba08f9729e0084739", + "category_depth": 0, + "languages": [ + "eng" + ], + "page_number": 3 + }, + "text": "There will be many customers in account and enquiry file who will get left out. This is fine as we anyway don\u2019t know their good/bad label for training purpose. " + }, + { + "type": "Title", + "element_id": "b952b3e6d0e34020f1f48b5d9243d0a4", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "0072e6b934945d5ba08f9729e0084739", + "category_depth": 1, + "languages": [ + "eng" + ], + "page_number": 3 + }, + "text": "\u00a9 DataMites\u2122. All Rights Reserved | www.datamites.com" + }, + { + "type": "Title", + "element_id": "ed83647ab77addbea9e4dca5f7d8f216", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "category_depth": 0, + "languages": [ + "eng" + ], + "page_number": 4 + }, + "text": "Analysis of Data" + }, + { + "type": "ListItem", + "element_id": "d936f750c577a228ebabd9ed2cec9a70", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "ed83647ab77addbea9e4dca5f7d8f216", + "category_depth": 1, + "languages": [ + "eng" + ], + "page_number": 4 + }, + "text": "Show using Excel File" + }, + { + "type": "Title", + "element_id": "b952b3e6d0e34020f1f48b5d9243d0a4", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "ed83647ab77addbea9e4dca5f7d8f216", + "category_depth": 1, + "languages": [ + "eng" + ], + "page_number": 4 + }, + "text": "\u00a9 DataMites\u2122. All Rights Reserved | www.datamites.com" + }, + { + "type": "Title", + "element_id": "7207da66fd1c6771ee1a5705dc41c0c7", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "category_depth": 0, + "languages": [ + "eng" + ], + "page_number": 5 + }, + "text": "Explain Coding / outcomes " + }, + { + "type": "ListItem", + "element_id": "815ef1753a8bcb1ce21d59819bdc6834", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "7207da66fd1c6771ee1a5705dc41c0c7", + "category_depth": 1, + "languages": [ + "eng" + ], + "page_number": 5 + }, + "text": "Show using Jupyter" + }, + { + "type": "Title", + "element_id": "b952b3e6d0e34020f1f48b5d9243d0a4", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "7207da66fd1c6771ee1a5705dc41c0c7", + "category_depth": 1, + "languages": [ + "eng" + ], + "page_number": 5 + }, + "text": "\u00a9 DataMites\u2122. All Rights Reserved | www.datamites.com" + }, + { + "type": "Title", + "element_id": "2034ce6155036f8a009ef33985209e88", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "7207da66fd1c6771ee1a5705dc41c0c7", + "category_depth": 1, + "languages": [ + "eng" + ], + "page_number": 6 + }, + "text": "Thank You" + }, + { + "type": "Title", + "element_id": "b952b3e6d0e34020f1f48b5d9243d0a4", + "metadata": { + "filename": "Bank Good Credit Loan.pptx", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:14", + "filetype": "application/vnd.openxmlformats-officedocument.presentationml.presentation", + "parent_id": "7207da66fd1c6771ee1a5705dc41c0c7", + "category_depth": 1, + "languages": [ + "eng" + ], + "page_number": 6 + }, + "text": "\u00a9 DataMites\u2122. All Rights Reserved | www.datamites.com" + } +] \ No newline at end of file diff --git a/example-docs/test_evaluate_files/unstructured_output/Performance-Audit-Discussion.pdf.json b/example-docs/test_evaluate_files/unstructured_output/Performance-Audit-Discussion.pdf.json new file mode 100644 index 000000000..db9ff0630 --- /dev/null +++ b/example-docs/test_evaluate_files/unstructured_output/Performance-Audit-Discussion.pdf.json @@ -0,0 +1,2029 @@ +[ + { + "type": "UncategorizedText", + "element_id": "9940112be89a9934ffb629a14cedbf71", + "metadata": { + "coordinates": { + "points": [ + [ + 35.6, + 48.4 + ], + [ + 35.6, + 137.4 + ], + [ + 565.2, + 137.4 + ], + [ + 565.2, + 48.4 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 1, + "links": [] + }, + "text": "GAGAS Performance Audits: Discussion of Concepts to Consider When Auditing Public Functions and Services" + }, + { + "type": "NarrativeText", + "element_id": "6c7d2f92f64287dd9ab227ad5c8b54fc", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 161.9 + ], + [ + 216.0, + 264.6 + ], + [ + 574.9, + 264.6 + ], + [ + 574.9, + 161.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 1, + "links": [ + { + "text": "2", + "url": null, + "start_index": 508 + } + ] + }, + "text": "The introductory chapter of Government Auditing Standards (GAGAS)1 outlines five concepts describing how public officials are to provide functions and services: effectively, efficiently, economically, ethically, and equitably. When planning, gathering and assessing evidence, and reporting audit results, auditors may focus on one or more of these concepts. The following discussion is intended to assist auditors when developing audit objectives for performance audits of government programs and activities.2" + }, + { + "type": "NarrativeText", + "element_id": "10f685b28da548c01c7f3a85dd535132", + "metadata": { + "coordinates": { + "points": [ + [ + 256.6, + 292.1 + ], + [ + 256.6, + 368.1 + ], + [ + 545.3, + 368.1 + ], + [ + 545.3, + 292.1 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 1, + "links": [] + }, + "text": "This discussion is designed to help auditors understand and apply the concepts cited above for performance audits conducted in accordance with GAGAS. This discussion does not contain requirements, does not amend GAGAS, and is not considered interpretive guidance, as defined in chapter 2 of GAGAS." + }, + { + "type": "Title", + "element_id": "31cafd4342fd0dc7525b762847501fbd", + "metadata": { + "coordinates": { + "points": [ + [ + 36.7, + 411.7 + ], + [ + 36.7, + 428.6 + ], + [ + 194.5, + 428.6 + ], + [ + 194.5, + 411.7 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 1, + "links": [] + }, + "text": "GAGAS Paragraphs" + }, + { + "type": "Title", + "element_id": "18e424f8255309e06c2c04f9b8d0674e", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 408.6 + ], + [ + 216.0, + 419.6 + ], + [ + 301.0, + 419.6 + ], + [ + 301.0, + 408.6 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 1, + "links": [] + }, + "text": "Paragraph 1.02:" + }, + { + "type": "NarrativeText", + "element_id": "a19eb590c59cfeda471b2b7c3c288a25", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 433.6 + ], + [ + 252.0, + 535.6 + ], + [ + 539.9, + 535.6 + ], + [ + 539.9, + 433.6 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "18e424f8255309e06c2c04f9b8d0674e", + "page_number": 1, + "links": [] + }, + "text": "The concept of accountability for use of public resources and government authority is key to our nation\u2019s governing processes. Management and officials entrusted with public resources are responsible for carrying out public functions and providing service to the public effectively, efficiently, economically, ethically, and equitably within the context of the statutory boundaries of the specific government program. [Emphasis added.]" + }, + { + "type": "Title", + "element_id": "55147246e47e38b64025fed24cd49f4a", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 549.6 + ], + [ + 216.0, + 560.6 + ], + [ + 301.0, + 560.6 + ], + [ + 301.0, + 549.6 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 1, + "links": [] + }, + "text": "Paragraph 1.03:" + }, + { + "type": "NarrativeText", + "element_id": "57db9a7ec532415b5fad899aaf54628a", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 574.6 + ], + [ + 252.0, + 637.6 + ], + [ + 541.8, + 637.6 + ], + [ + 541.8, + 574.6 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "55147246e47e38b64025fed24cd49f4a", + "page_number": 1, + "links": [] + }, + "text": "As reflected in applicable laws, regulations, agreements, and standards, management and officials of government programs are responsible for providing reliable, useful, and timely information for transparency and accountability of these programs and their operations. Legislators, oversight" + }, + { + "type": "Title", + "element_id": "c8873928e000c11a2838ad53f807cf85", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 654.8 + ], + [ + 216.0, + 675.0 + ], + [ + 556.0, + 675.0 + ], + [ + 556.0, + 654.8 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 1, + "links": [ + { + "text": "GAO - 21 - 368G", + "url": "https://www.gao.gov/yellowbook", + "start_index": 52 + } + ] + }, + "text": "1GAO, Government Auditing Standards: 2018 Revision, GAO-21-368G (Washington, D.C.: April 2021)" + }, + { + "type": "NarrativeText", + "element_id": "3326b7777d707fd7a57e708a7ff8c9a0", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 682.7 + ], + [ + 216.0, + 713.0 + ], + [ + 573.5, + 713.0 + ], + [ + 573.5, + 682.7 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "c8873928e000c11a2838ad53f807cf85", + "page_number": 1, + "links": [] + }, + "text": "2The concepts cited may also be applicable to other GAGAS engagements, based on the auditors\u2019 judgments. This discussion is limited to considering these concepts in performance audits." + }, + { + "type": "Title", + "element_id": "e8b8355262d0af49d5243e0225b02368", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 742.9 + ], + [ + 216.0, + 750.9 + ], + [ + 244.0, + 750.9 + ], + [ + 244.0, + 742.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 1, + "links": [] + }, + "text": "Page 1" + }, + { + "type": "Title", + "element_id": "5eb6cf647d2c5d14a044e2d103dcef4c", + "metadata": { + "coordinates": { + "points": [ + [ + 36.7, + 281.5 + ], + [ + 36.7, + 298.4 + ], + [ + 123.6, + 298.4 + ], + [ + 123.6, + 281.5 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 2, + "links": [] + }, + "text": "Discussion" + }, + { + "type": "Title", + "element_id": "a4f3df623c154d7203f57f6ce3aa50d5", + "metadata": { + "coordinates": { + "points": [ + [ + 36.7, + 520.8 + ], + [ + 36.7, + 534.8 + ], + [ + 94.1, + 534.8 + ], + [ + 94.1, + 520.8 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 2, + "links": [] + }, + "text": "Effective" + }, + { + "type": "NarrativeText", + "element_id": "ee7c4e99de43f24c964bce2254f02c6a", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 162.3 + ], + [ + 252.0, + 264.8 + ], + [ + 540.5, + 264.8 + ], + [ + 540.5, + 162.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "a4f3df623c154d7203f57f6ce3aa50d5", + "page_number": 2, + "links": [] + }, + "text": "bodies, those charged with governance, and the public need to know whether (1) management and officials manage government resources and use their authority properly and in compliance with laws and regulations; (2) government programs are achieving their objectives and desired outcomes; and (3) government services are provided effectively, efficiently, economically, ethically, and equitably. [Emphasis added.]" + }, + { + "type": "NarrativeText", + "element_id": "98a43b7a853f2a36d3cb083be616f7f9", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 278.3 + ], + [ + 216.0, + 445.3 + ], + [ + 578.5, + 445.3 + ], + [ + 578.5, + 278.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "a4f3df623c154d7203f57f6ce3aa50d5", + "page_number": 2, + "links": [] + }, + "text": "Government administration best serves the collective interest of the public when it is effective, efficient, economical, ethical, and equitable. Auditors help inform legislators, oversight bodies, those charged with governance, and the public about whether public services are being provided consistent with these concepts. Government auditing can contribute to accountability and can help improve government administration by identifying deficiencies and recommending enhancements to achieve effective, efficient, economical, ethical, and equitable outcomes, when appropriate within the context of the audit objectives. As such, it is important for auditors to understand the concepts below as they relate to administering government programs or activities and how they can assess or address these expectations of government performance in conducting their performance audits." + }, + { + "type": "NarrativeText", + "element_id": "be89724efafbbc8127b90ee50f7acfb1", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 459.3 + ], + [ + 216.0, + 509.3 + ], + [ + 565.7, + 509.3 + ], + [ + 565.7, + 459.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "a4f3df623c154d7203f57f6ce3aa50d5", + "page_number": 2, + "links": [] + }, + "text": "The examples that follow the discussion of each concept illustrate the distinctions between these concepts. In a performance audit, it is common practice to incorporate more than one of these concepts when conducting the audit." + }, + { + "type": "NarrativeText", + "element_id": "18681facfeaa31539d5f9e13badd907e", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 523.3 + ], + [ + 216.0, + 625.3 + ], + [ + 573.7, + 625.3 + ], + [ + 573.7, + 523.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "a4f3df623c154d7203f57f6ce3aa50d5", + "page_number": 2, + "links": [] + }, + "text": "The administration of a government program or activity is effective when it achieves the intended results. A performance audit that focuses on the effectiveness of a program or activity seeks to establish a cause-and- effect relationship between the operation of the program or activity and achieving its stated objectives. Achieving the objectives does not guarantee that the program or activity was effective unless the auditors can establish that the program or activity caused, or contributed to, the desired outcome." + }, + { + "type": "NarrativeText", + "element_id": "9f06cf8dd4c97ad5020e7e1a061dad7b", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 639.3 + ], + [ + 252.0, + 715.4 + ], + [ + 569.9, + 715.4 + ], + [ + 569.9, + 639.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "a4f3df623c154d7203f57f6ce3aa50d5", + "page_number": 2, + "links": [] + }, + "text": "Example: In a performance audit examining how effective a housing voucher program was in achieving its goal of improving economic outcomes for recipients, auditors may determine whether receiving housing vouchers led to better subsequent economic outcomes for recipients than those of similarly situated individuals who did not receive vouchers." + }, + { + "type": "Title", + "element_id": "794f7062cf3f56f2c7d70702bd3d13e1", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 742.9 + ], + [ + 216.0, + 750.9 + ], + [ + 244.0, + 750.9 + ], + [ + 244.0, + 742.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 2, + "links": [] + }, + "text": "Page 2" + }, + { + "type": "Title", + "element_id": "9bd2448fa5cf6d622bffb860a3e28f91", + "metadata": { + "coordinates": { + "points": [ + [ + 36.7, + 236.9 + ], + [ + 36.7, + 250.8 + ], + [ + 90.2, + 250.8 + ], + [ + 90.2, + 236.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 3, + "links": [] + }, + "text": "Efficient" + }, + { + "type": "Title", + "element_id": "9bdb2a16884b7f663fb433076e0da16c", + "metadata": { + "coordinates": { + "points": [ + [ + 36.7, + 570.8 + ], + [ + 36.7, + 584.8 + ], + [ + 113.0, + 584.8 + ], + [ + 113.0, + 570.8 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 3, + "links": [] + }, + "text": "Economical" + }, + { + "type": "NarrativeText", + "element_id": "875be099212a06e627ad505160a6fcfa", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 162.3 + ], + [ + 252.0, + 225.3 + ], + [ + 577.2, + 225.3 + ], + [ + 577.2, + 162.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "9bdb2a16884b7f663fb433076e0da16c", + "page_number": 3, + "links": [] + }, + "text": "Example: In a performance audit assessing the effectiveness of an after-school program targeted at helping students improve their reading proficiency, auditors may examine the extent to which participants\u2019 reading levels improved relative to baseline data from before they joined the program." + }, + { + "type": "NarrativeText", + "element_id": "bcdb4285734fa823729c56644f979284", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 239.3 + ], + [ + 216.0, + 315.3 + ], + [ + 576.7, + 315.3 + ], + [ + 576.7, + 239.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "9bdb2a16884b7f663fb433076e0da16c", + "page_number": 3, + "links": [] + }, + "text": "The administration of a government program or activity is efficient when it gets the most value from available resources. When a performance audit focuses on efficiency, auditors examine whether the resources used to administer a program or activity have been put to optimal or satisfactory use, or whether the same or similar results could have been achieved more timely or with fewer resources." + }, + { + "type": "NarrativeText", + "element_id": "ec40806263667a67e898c5ffaec59516", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 329.4 + ], + [ + 252.0, + 405.3 + ], + [ + 568.6, + 405.3 + ], + [ + 568.6, + 329.4 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "9bdb2a16884b7f663fb433076e0da16c", + "page_number": 3, + "links": [] + }, + "text": "Example: In a performance audit assessing a disaster relief agency\u2019s mobilization of resources to respond to a disaster, auditors may assess the disaster relief agency\u2019s timeliness in providing relief compared to its own previous performance or the performance of other similarly situated agencies that have responded to comparable disasters." + }, + { + "type": "NarrativeText", + "element_id": "e18280f4a42970bd6903d21b9eb5b155", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 419.4 + ], + [ + 252.0, + 469.4 + ], + [ + 577.8, + 469.4 + ], + [ + 577.8, + 419.4 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "9bdb2a16884b7f663fb433076e0da16c", + "page_number": 3, + "links": [] + }, + "text": "Example: In a performance audit assessing a consumer protection agency\u2019s response to consumer complaints, auditors may assess whether the agency\u2019s efforts to streamline its processes resulted in improved timely resolution of complaints." + }, + { + "type": "NarrativeText", + "element_id": "f42e15abb222545aa842032bdb221a93", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 483.3 + ], + [ + 252.0, + 559.4 + ], + [ + 577.2, + 559.4 + ], + [ + 577.2, + 483.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "9bdb2a16884b7f663fb433076e0da16c", + "page_number": 3, + "links": [] + }, + "text": "Example: In a performance audit assessing the time a state needs to process unemployment benefits targeted at helping those in need, auditors may assess how long the process takes from receipt of the unemployment application to the applicant\u2019s receipt of the benefit, including steps such as verifying required information." + }, + { + "type": "NarrativeText", + "element_id": "67e14703352de7c2d01b65b9d38aa79c", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 573.3 + ], + [ + 216.0, + 636.3 + ], + [ + 571.8, + 636.3 + ], + [ + 571.8, + 573.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "9bdb2a16884b7f663fb433076e0da16c", + "page_number": 3, + "links": [] + }, + "text": "The administration of a government program or activity is economical when it minimizes the costs of resources used in performing its functions while meeting timeliness and quality considerations for those resources. When auditing economy, auditors primarily focus on the costs of inputs rather than on the outcomes achieved." + }, + { + "type": "NarrativeText", + "element_id": "0570a941ed2cc8770fef88241b5168ce", + "metadata": { + "coordinates": { + "points": [ + [ + 252.1, + 650.4 + ], + [ + 252.1, + 700.4 + ], + [ + 569.4, + 700.4 + ], + [ + 569.4, + 650.4 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "9bdb2a16884b7f663fb433076e0da16c", + "page_number": 3, + "links": [] + }, + "text": "Example: In a performance audit examining an agency\u2019s international travel expenses, in addition to assessing the design of internal controls and compliance with expense guidelines, auditors may test whether, for a sample of trips, bookings of" + }, + { + "type": "Title", + "element_id": "b2172e20df3711d87a12d093beb8a9b9", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 742.9 + ], + [ + 216.0, + 750.9 + ], + [ + 244.0, + 750.9 + ], + [ + 244.0, + 742.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 3, + "links": [] + }, + "text": "Page 3" + }, + { + "type": "Title", + "element_id": "82c884cb91f6082cbd09816a91a215a1", + "metadata": { + "coordinates": { + "points": [ + [ + 36.7, + 287.9 + ], + [ + 36.7, + 301.8 + ], + [ + 82.6, + 301.8 + ], + [ + 82.6, + 287.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 4, + "links": [] + }, + "text": "Ethical" + }, + { + "type": "Title", + "element_id": "cbfa75893f5d570acffb728e22430c72", + "metadata": { + "coordinates": { + "points": [ + [ + 36.7, + 634.8 + ], + [ + 36.7, + 648.8 + ], + [ + 99.0, + 648.8 + ], + [ + 99.0, + 634.8 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 4, + "links": [] + }, + "text": "Equitable" + }, + { + "type": "NarrativeText", + "element_id": "58162fa7967e72c03492429bcaa142e7", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 162.3 + ], + [ + 252.0, + 186.3 + ], + [ + 550.4, + 186.3 + ], + [ + 550.4, + 162.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "cbfa75893f5d570acffb728e22430c72", + "page_number": 4, + "links": [] + }, + "text": "equivalent airline tickets and hotel rooms could be found at a lower cost." + }, + { + "type": "NarrativeText", + "element_id": "dc4a96af25f1668528e4e5b9c96eb530", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 200.4 + ], + [ + 252.0, + 276.3 + ], + [ + 574.6, + 276.3 + ], + [ + 574.6, + 200.4 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "cbfa75893f5d570acffb728e22430c72", + "page_number": 4, + "links": [] + }, + "text": "Example: In a performance audit assessing an agency\u2019s acquisition practices, auditors may examine whether the agency\u2019s decisions regarding purchasing, leasing, or reimbursing employees for the costs of acquiring various supplies or equipment achieved the lowest cost while meeting applicable requirements." + }, + { + "type": "NarrativeText", + "element_id": "a52482c1f5da0709cf17d08aaf3594f4", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 290.3 + ], + [ + 216.0, + 418.3 + ], + [ + 576.8, + 418.3 + ], + [ + 576.8, + 290.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "cbfa75893f5d570acffb728e22430c72", + "page_number": 4, + "links": [] + }, + "text": "The administration of a government program or activity is ethical when it advances the collective interest of the public rather than private gain and is conducted with honesty, integrity, and impartiality. Laws and regulations often specify rules of ethical conduct. Therefore, audits examining the ethical administration of a program or activity may involve assessing compliance with such laws and regulations. Fraud in administering a government program or activity betrays the public trust and is, by definition, unethical. In addition, auditors may identify instances of unethical conduct that result in waste and abuse during testing of internal controls as part of a performance audit." + }, + { + "type": "NarrativeText", + "element_id": "02ea2415a3674a14ebcf99b61489f37d", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 432.3 + ], + [ + 252.0, + 482.4 + ], + [ + 559.5, + 482.4 + ], + [ + 559.5, + 432.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "cbfa75893f5d570acffb728e22430c72", + "page_number": 4, + "links": [] + }, + "text": "Example: In a performance audit assessing agency officials\u2019 compliance with conflict-of-interest requirements, auditors may compare a sample of financial disclosure reports filed against requirements in statute or regulation." + }, + { + "type": "NarrativeText", + "element_id": "7cc5accf1fa38e5192636acf290ba038", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 496.4 + ], + [ + 252.0, + 559.4 + ], + [ + 571.1, + 559.4 + ], + [ + 571.1, + 496.4 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "cbfa75893f5d570acffb728e22430c72", + "page_number": 4, + "links": [] + }, + "text": "Example: In a performance audit assessing potential regulatory capture related to a particular industry, auditors may assess the extent to which the regulatory agency has sufficient controls to reasonably assure its employees\u2019 independence from the entities subject to the agency\u2019s regulation." + }, + { + "type": "NarrativeText", + "element_id": "16c9b22e0ab59124f454d356f765d4fa", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 573.3 + ], + [ + 252.0, + 623.4 + ], + [ + 558.2, + 623.4 + ], + [ + 558.2, + 573.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "cbfa75893f5d570acffb728e22430c72", + "page_number": 4, + "links": [] + }, + "text": "Example: In a performance audit assessing an office\u2019s policies and procedures for purchase cards, auditors\u2019 testing of the program\u2019s controls to identify deficiencies may identify fraud, waste, or abuse in its administration." + }, + { + "type": "NarrativeText", + "element_id": "827bf8c4821c56f1c176277130d21dd7", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 637.3 + ], + [ + 216.0, + 700.3 + ], + [ + 577.3, + 700.3 + ], + [ + 577.3, + 637.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "cbfa75893f5d570acffb728e22430c72", + "page_number": 4, + "links": [] + }, + "text": "The administration of a government program or activity is equitable when it consistently serves members of the public, distributes public services, and implements public policy in a manner that promotes fairness, justice, and equality. Auditing whether the administration of a government program or activity is equitable may include assessing the" + }, + { + "type": "Title", + "element_id": "be315c294e89da2fc16a03db7ee60db5", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 742.9 + ], + [ + 216.0, + 750.9 + ], + [ + 244.0, + 750.9 + ], + [ + 244.0, + 742.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 4, + "links": [] + }, + "text": "Page 4" + }, + { + "type": "ListItem", + "element_id": "ad0f094c2573237ba04438481feca1c8", + "metadata": { + "coordinates": { + "points": [ + [ + 234.0, + 161.5 + ], + [ + 234.0, + 173.7 + ], + [ + 484.9, + 173.7 + ], + [ + 484.9, + 161.5 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "be315c294e89da2fc16a03db7ee60db5", + "page_number": 5, + "links": [] + }, + "text": "equality of access to and provision of services;" + }, + { + "type": "ListItem", + "element_id": "8a615420b0ad8685926f80a74cc91dd8", + "metadata": { + "coordinates": { + "points": [ + [ + 234.0, + 186.9 + ], + [ + 234.0, + 212.1 + ], + [ + 527.8, + 212.1 + ], + [ + 527.8, + 186.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "be315c294e89da2fc16a03db7ee60db5", + "page_number": 5, + "links": [] + }, + "text": "procedural fairness and equal treatment of individuals in government programs and policies;" + }, + { + "type": "ListItem", + "element_id": "4f5c4366f627be783d08da03e2cef2ea", + "metadata": { + "coordinates": { + "points": [ + [ + 234.0, + 225.3 + ], + [ + 234.0, + 237.4 + ], + [ + 407.3, + 237.4 + ], + [ + 407.3, + 225.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "be315c294e89da2fc16a03db7ee60db5", + "page_number": 5, + "links": [] + }, + "text": "causes of disparate outcomes;" + }, + { + "type": "ListItem", + "element_id": "adfd518e16099bbd6b69d5da447da492", + "metadata": { + "coordinates": { + "points": [ + [ + 234.0, + 250.7 + ], + [ + 234.0, + 275.8 + ], + [ + 564.5, + 275.8 + ], + [ + 564.5, + 250.7 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "be315c294e89da2fc16a03db7ee60db5", + "page_number": 5, + "links": [] + }, + "text": "or distributional impacts of public policies, programs, resources, and services." + }, + { + "type": "NarrativeText", + "element_id": "61721ca98549c8ed54f90d522f5a93be", + "metadata": { + "coordinates": { + "points": [ + [ + 234.0, + 289.8 + ], + [ + 234.0, + 365.8 + ], + [ + 572.1, + 365.8 + ], + [ + 572.1, + 289.8 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "be315c294e89da2fc16a03db7ee60db5", + "page_number": 5, + "links": [] + }, + "text": "Disaggregating data by social groups or communities that share a particular characteristic (e.g., gender, race, ethnicity, age, or income) can help illuminate differences. Reporting on such differences, when appropriate within the context of the audit objectives, can increase understanding of the effects of policies and programs on issues of equity." + }, + { + "type": "NarrativeText", + "element_id": "855e8bbd89ede4f7a2089ed7904deb38", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 379.9 + ], + [ + 252.0, + 442.9 + ], + [ + 570.7, + 442.9 + ], + [ + 570.7, + 379.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "be315c294e89da2fc16a03db7ee60db5", + "page_number": 5, + "links": [] + }, + "text": "Example: In a performance audit assessing the granting of waivers from particular requirements, auditors may use disaggregated data about waiver recipients to assess whether different groups or communities were treated fairly and equally in the process." + }, + { + "type": "NarrativeText", + "element_id": "4e37cdde662ee651a634d6a39a19d624", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 456.9 + ], + [ + 252.0, + 545.9 + ], + [ + 571.2, + 545.9 + ], + [ + 571.2, + 456.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "be315c294e89da2fc16a03db7ee60db5", + "page_number": 5, + "links": [] + }, + "text": "Example: In a performance audit assessing a grant program aimed at expanding internet access, auditors may assess the extent to which formulas, criteria, or other factors (such as matching funds or capital requirements) considered in the distribution of grant funds may be to the specific advantage or disadvantage of certain groups, regions, or communities, thereby causing inequities." + }, + { + "type": "NarrativeText", + "element_id": "618734ec961ab105b926a7fb4d39214d", + "metadata": { + "coordinates": { + "points": [ + [ + 252.0, + 559.9 + ], + [ + 252.0, + 609.9 + ], + [ + 577.1, + 609.9 + ], + [ + 577.1, + 559.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "be315c294e89da2fc16a03db7ee60db5", + "page_number": 5, + "links": [] + }, + "text": "Example: In a performance audit assessing scholarship outcomes in higher education programs, auditors may report on the distribution of scholarships by race, gender identity, and income to illuminate potential disparities among scholarship recipients." + }, + { + "type": "NarrativeText", + "element_id": "234e7ada767cf237379e4b835b80bd6a", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 648.9 + ], + [ + 216.0, + 698.9 + ], + [ + 574.3, + 698.9 + ], + [ + 574.3, + 648.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "be315c294e89da2fc16a03db7ee60db5", + "page_number": 5, + "links": [] + }, + "text": "These concepts may overlap. For example, efficiency may also be a component of effectiveness. Similarly, when appropriate within the context of the program and audit objectives, auditors may disaggregate the results of performance audits that focus on efficiency or effectiveness" + }, + { + "type": "Title", + "element_id": "438ff2afcf15e4b4536dc2514d74c66b", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 742.9 + ], + [ + 216.0, + 750.9 + ], + [ + 244.0, + 750.9 + ], + [ + 244.0, + 742.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 5, + "links": [] + }, + "text": "Page 5" + }, + { + "type": "Title", + "element_id": "365bf8cd5f0b6fab4e21fb8e6567faec", + "metadata": { + "coordinates": { + "points": [ + [ + 36.7, + 331.4 + ], + [ + 36.7, + 348.4 + ], + [ + 200.1, + 348.4 + ], + [ + 200.1, + 331.4 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 6, + "links": [] + }, + "text": "For More Information" + }, + { + "type": "NarrativeText", + "element_id": "525cb5469e5839a9d3e221548b3ed97c", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 162.3 + ], + [ + 216.0, + 186.3 + ], + [ + 569.4, + 186.3 + ], + [ + 569.4, + 162.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "365bf8cd5f0b6fab4e21fb8e6567faec", + "page_number": 6, + "links": [] + }, + "text": "issues to illuminate inequities in program administration or in distribution of public services." + }, + { + "type": "NarrativeText", + "element_id": "d7f5d4302b77323c3bc83943352d8708", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 200.3 + ], + [ + 216.0, + 289.3 + ], + [ + 576.8, + 289.3 + ], + [ + 576.8, + 200.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "365bf8cd5f0b6fab4e21fb8e6567faec", + "page_number": 6, + "links": [] + }, + "text": "While all of these concepts are important to administering government programs responsibly, it is up to the professional judgment of the auditors to determine the specific concepts that are relevant in conducting the performance audit and reporting the results. Auditors\u2019 professional judgments are informed by, among other things, the needs of the users of the audit reports; the nature, context, and objectives of the program or activity under audit; and the public interest." + }, + { + "type": "NarrativeText", + "element_id": "382d11bff25efbf9e606798da6cf3382", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 328.3 + ], + [ + 216.0, + 339.3 + ], + [ + 563.8, + 339.3 + ], + [ + 563.8, + 328.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "365bf8cd5f0b6fab4e21fb8e6567faec", + "page_number": 6, + "links": [] + }, + "text": "To view the current Yellow Book, visit https://www.gao.gov/yellowbook." + }, + { + "type": "NarrativeText", + "element_id": "1127e23fc1228d69ea4300504271c662", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 353.3 + ], + [ + 216.0, + 377.3 + ], + [ + 480.1, + 377.3 + ], + [ + 480.1, + 353.3 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "parent_id": "365bf8cd5f0b6fab4e21fb8e6567faec", + "page_number": 6, + "links": [] + }, + "text": "For technical assistance, call (202) 512-9535 or email yellowbook@gao.gov." + }, + { + "type": "Title", + "element_id": "b97e7d061ab32cd89c497e7b70c5aeac", + "metadata": { + "coordinates": { + "points": [ + [ + 216.0, + 742.9 + ], + [ + 216.0, + 750.9 + ], + [ + 244.0, + 750.9 + ], + [ + 244.0, + 742.9 + ] + ], + "system": "PixelSpace", + "layout_width": 612.0, + "layout_height": 792.0 + }, + "filename": "Performance-Audit-Discussion.pdf", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:16:47", + "filetype": "application/pdf", + "page_number": 6, + "links": [] + }, + "text": "Page 6" + } +] \ No newline at end of file diff --git a/example-docs/test_evaluate_files/unstructured_output/currency.csv.json b/example-docs/test_evaluate_files/unstructured_output/currency.csv.json new file mode 100644 index 000000000..caa52520a --- /dev/null +++ b/example-docs/test_evaluate_files/unstructured_output/currency.csv.json @@ -0,0 +1,17 @@ +[ + { + "type": "Table", + "element_id": "0f932c1c78cd59aef141af819dfdcf84", + "metadata": { + "filename": "currency.csv", + "file_directory": "tmpdocs", + "last_modified": "2023-11-02T15:17:41", + "filetype": "text/csv", + "languages": [ + "eng" + ], + "text_as_html": "\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
CodeSymbolName
AED\u062f.\u0625United Arab Emirates d
AFN\u060bAfghan afghani
ALLLAlbanian lek
AMDAMDArmenian dram
ANG\u0192Netherlands Antillean gu
AOAKzAngolan kwanza
ARS$Argentine peso
AUD$Australian dollar
AWGAfl.Aruban florin
AZNAZNAzerbaijani manat
BAMKMBosnia and Herzegovina
BBD$Barbadian dollar
BDT\u09f3Bangladeshi taka
BGN\u043b\u0432.Bulgarian lev
BHD.\u062f.\u0628Bahraini dinar
BIFFrBurundian franc
BMD$Bermudian dollar
BND$Brunei dollar
BOBBs.Bolivian boliviano
BRLR$Brazilian real
BSD$Bahamian dollar
BTC\u0e3fBitcoin
BTNNu.Bhutanese ngultrum
BWPPBotswana pula
BYRBrBelarusian ruble (old)'
BYNBrBelarusian ruble
BZD$Belize dollar
CAD$Canadian dollar
CDFFrCongolese franc
CHFCHFSwiss franc
CLP$Chilean peso
CNY\u00a5Chinese yuan
COP$Colombian peso
CRC\u20a1Costa Rican col\u00f3n
CUC$Cuban convertible peso')
CUP$Cuban peso
CVE$Cape Verdean escudo
CZKK\u010dCzech koruna
DJFFrDjiboutian franc
DKKDKKDanish krone
DOPRD$Dominican peso
DZD\u062f.\u062cAlgerian dinar
EGPEGPEgyptian pound
ERNNfkEritrean nakfa
ETBBrEthiopian birr
EUR\u20acEuro
FJD$Fijian dollar
FKP\u00a3Falkland Islands pound')
GBP\u00a3Pound sterling
GEL\u20beGeorgian lari
GGP\u00a3Guernsey pound
GHS\u20b5Ghana cedi
GIP\u00a3Gibraltar pound
GMDDGambian dalasi
GNFFrGuinean franc
GTQQGuatemalan quetzal
GYD$Guyanese dollar
HKD$Hong Kong dollar
HNLLHonduran lempira
HRKknCroatian kuna
HTGGHaitian gourde
HUFFtHungarian forint
IDRRpIndonesian rupiah
ILS\u20aaIsraeli new shekel
IMP\u00a3Manx pound
INR\u20b9Indian rupee
IQD\u0639.\u062fIraqi dinar
IRR\ufdfcIranian rial
IRT\u062a\u0648\u0645\u0627\u0646Iranian toman
ISKkr.Icelandic kr\u00f3na
JEP\u00a3Jersey pound
JMD$Jamaican dollar
JOD\u062f.\u0627Jordanian dinar
JPY\u00a5Japanese yen
KESKShKenyan shilling
KGS\u0441\u043e\u043cKyrgyzstani som
KHR\u17dbCambodian riel
KMFFrComorian franc
KPW\u20a9North Korean won
KRW\u20a9South Korean won
KWD\u062f.\u0643Kuwaiti dinar
KYD$Cayman Islands dollar
KZT\u20b8Kazakhstani tenge
LAK\u20adLao kip
LBP\u0644.\u0644Lebanese pound
LKR\u0dbb\u0dd4Sri Lankan rupee
LRD$Liberian dollar
LSLLLesotho loti
LYD\u0644.\u062fLibyan dinar
MAD\u062f.\u0645.Moroccan dirham
MDLMDLMoldovan leu
MGAArMalagasy ariary
MKD\u0434\u0435\u043dMacedonian denar
MMKKsBurmese kyat
MNT\u20aeMongolian t\u00f6gr\u00f6g
MOPPMacanese pataca
MRUUMMauritanian ouguiya
MUR\u20a8Mauritian rupee
MVR.\u0783Maldivian rufiyaa
MWKMKMalawian kwacha
MXN$Mexican peso
MYRRMMalaysian ringgit
MZNMTMozambican metical
NADN$Namibian dollar
NGN\u20a6Nigerian naira
NIOC$Nicaraguan c\u00f3rdoba
NOKkrNorwegian krone
NPR\u20a8Nepalese rupee
NZD$New Zealand dollar
OMR\u0631.\u0639.Omani rial
PABB/.Panamanian balboa
PENS/Sol
PGKKPapua New Guinean kina')
PHP\u20b1Philippine peso
PKR\u20a8Pakistani rupee
PLNz\u0142Polish z\u0142oty
PRB\u0440.Transnistrian ruble
PYG\u20b2Paraguayan guaran\u00ed
QAR\u0631.\u0642Qatari riyal
RONleiRomanian leu
RSD\u0440\u0441\u0434Serbian dinar
RUB\u20bdRussian ruble
RWFFrRwandan franc
SAR\u0631.\u0633Saudi riyal
SBD$Solomon Islands dollar')
SCR\u20a8Seychellois rupee
SDG\u062c.\u0633.Sudanese pound
SEKkrSwedish krona
SGD$Singapore dollar
SHP\u00a3Saint Helena pound
SLLLeSierra Leonean leone
SOSShSomali shilling
SRD$Surinamese dollar
SSP\u00a3South Sudanese pound
STNDbS\u00e3o Tom\u00e9 and Pr\u00edncipe d
SYP\u0644.\u0633Syrian pound
SZLLSwazi lilangeni
THB\u0e3fThai baht
TJS\u0405\u041cTajikistani somoni
TMTmTurkmenistan manat
TND\u062f.\u062aTunisian dinar
TOPT$Tongan pa\u02bbanga
TRY\u20baTurkish lira
TTD$Trinidad and Tobago doll
TWDNT$New Taiwan dollar
TZSShTanzanian shilling
UAH\u20b4Ukrainian hryvnia
UGXUGXUgandan shilling
USD$United States (US) dolla
UYU$Uruguayan peso
UZSUZSUzbekistani som
VEFBs FVenezuelan bol\u00edvar
VESBs.SBol\u00edvar soberano
VND\u20abVietnamese \u0111\u1ed3ng
VUVVtVanuatu vatu
WSTTSamoan t\u0101l\u0101
XAFCFACentral African CFA fr
XCD$East Caribbean dollar
XOFCFAWest African CFA franc
XPFFrCFP franc
YER\ufdfcYemeni rial
ZARRSouth African rand
ZMWZKZambian kwacha
" + }, + "text": "\n\n\nCode\nSymbol\nName\n\n\nAED\n\u062f.\u0625\nUnited Arab Emirates d\n\n\nAFN\n\u060b\nAfghan afghani\n\n\nALL\nL\nAlbanian lek\n\n\nAMD\nAMD\nArmenian dram\n\n\nANG\n\u0192\nNetherlands Antillean gu\n\n\nAOA\nKz\nAngolan kwanza\n\n\nARS\n$\nArgentine peso\n\n\nAUD\n$\nAustralian dollar\n\n\nAWG\nAfl.\nAruban florin\n\n\nAZN\nAZN\nAzerbaijani manat\n\n\nBAM\nKM\nBosnia and Herzegovina\n\n\nBBD\n$\nBarbadian dollar\n\n\nBDT\n\u09f3\nBangladeshi taka\n\n\nBGN\n\u043b\u0432.\nBulgarian lev\n\n\nBHD\n.\u062f.\u0628\nBahraini dinar\n\n\nBIF\nFr\nBurundian franc\n\n\nBMD\n$\nBermudian dollar\n\n\nBND\n$\nBrunei dollar\n\n\nBOB\nBs.\nBolivian boliviano\n\n\nBRL\nR$\nBrazilian real\n\n\nBSD\n$\nBahamian dollar\n\n\nBTC\n\u0e3f\nBitcoin\n\n\nBTN\nNu.\nBhutanese ngultrum\n\n\nBWP\nP\nBotswana pula\n\n\nBYR\nBr\nBelarusian ruble (old)'\n\n\nBYN\nBr\nBelarusian ruble\n\n\nBZD\n$\nBelize dollar\n\n\nCAD\n$\nCanadian dollar\n\n\nCDF\nFr\nCongolese franc\n\n\nCHF\nCHF\nSwiss franc\n\n\nCLP\n$\nChilean peso\n\n\nCNY\n\u00a5\nChinese yuan\n\n\nCOP\n$\nColombian peso\n\n\nCRC\n\u20a1\nCosta Rican col\u00f3n\n\n\nCUC\n$\nCuban convertible peso')\n\n\nCUP\n$\nCuban peso\n\n\nCVE\n$\nCape Verdean escudo\n\n\nCZK\nK\u010d\nCzech koruna\n\n\nDJF\nFr\nDjiboutian franc\n\n\nDKK\nDKK\nDanish krone\n\n\nDOP\nRD$\nDominican peso\n\n\nDZD\n\u062f.\u062c\nAlgerian dinar\n\n\nEGP\nEGP\nEgyptian pound\n\n\nERN\nNfk\nEritrean nakfa\n\n\nETB\nBr\nEthiopian birr\n\n\nEUR\n\u20ac\nEuro\n\n\nFJD\n$\nFijian dollar\n\n\nFKP\n\u00a3\nFalkland Islands pound')\n\n\nGBP\n\u00a3\nPound sterling\n\n\nGEL\n\u20be\nGeorgian lari\n\n\nGGP\n\u00a3\nGuernsey pound\n\n\nGHS\n\u20b5\nGhana cedi\n\n\nGIP\n\u00a3\nGibraltar pound\n\n\nGMD\nD\nGambian dalasi\n\n\nGNF\nFr\nGuinean franc\n\n\nGTQ\nQ\nGuatemalan quetzal\n\n\nGYD\n$\nGuyanese dollar\n\n\nHKD\n$\nHong Kong dollar\n\n\nHNL\nL\nHonduran lempira\n\n\nHRK\nkn\nCroatian kuna\n\n\nHTG\nG\nHaitian gourde\n\n\nHUF\nFt\nHungarian forint\n\n\nIDR\nRp\nIndonesian rupiah\n\n\nILS\n\u20aa\nIsraeli new shekel\n\n\nIMP\n\u00a3\nManx pound\n\n\nINR\n\u20b9\nIndian rupee\n\n\nIQD\n\u0639.\u062f\nIraqi dinar\n\n\nIRR\n\ufdfc\nIranian rial\n\n\nIRT\n\u062a\u0648\u0645\u0627\u0646\nIranian toman\n\n\nISK\nkr.\nIcelandic kr\u00f3na\n\n\nJEP\n\u00a3\nJersey pound\n\n\nJMD\n$\nJamaican dollar\n\n\nJOD\n\u062f.\u0627\nJordanian dinar\n\n\nJPY\n\u00a5\nJapanese yen\n\n\nKES\nKSh\nKenyan shilling\n\n\nKGS\n\u0441\u043e\u043c\nKyrgyzstani som\n\n\nKHR\n\u17db\nCambodian riel\n\n\nKMF\nFr\nComorian franc\n\n\nKPW\n\u20a9\nNorth Korean won\n\n\nKRW\n\u20a9\nSouth Korean won\n\n\nKWD\n\u062f.\u0643\nKuwaiti dinar\n\n\nKYD\n$\nCayman Islands dollar\n\n\nKZT\n\u20b8\nKazakhstani tenge\n\n\nLAK\n\u20ad\nLao kip\n\n\nLBP\n\u0644.\u0644\nLebanese pound\n\n\nLKR\n\u0dbb\u0dd4\nSri Lankan rupee\n\n\nLRD\n$\nLiberian dollar\n\n\nLSL\nL\nLesotho loti\n\n\nLYD\n\u0644.\u062f\nLibyan dinar\n\n\nMAD\n\u062f.\u0645.\nMoroccan dirham\n\n\nMDL\nMDL\nMoldovan leu\n\n\nMGA\nAr\nMalagasy ariary\n\n\nMKD\n\u0434\u0435\u043d\nMacedonian denar\n\n\nMMK\nKs\nBurmese kyat\n\n\nMNT\n\u20ae\nMongolian t\u00f6gr\u00f6g\n\n\nMOP\nP\nMacanese pataca\n\n\nMRU\nUM\nMauritanian ouguiya\n\n\nMUR\n\u20a8\nMauritian rupee\n\n\nMVR\n.\u0783\nMaldivian rufiyaa\n\n\nMWK\nMK\nMalawian kwacha\n\n\nMXN\n$\nMexican peso\n\n\nMYR\nRM\nMalaysian ringgit\n\n\nMZN\nMT\nMozambican metical\n\n\nNAD\nN$\nNamibian dollar\n\n\nNGN\n\u20a6\nNigerian naira\n\n\nNIO\nC$\nNicaraguan c\u00f3rdoba\n\n\nNOK\nkr\nNorwegian krone\n\n\nNPR\n\u20a8\nNepalese rupee\n\n\nNZD\n$\nNew Zealand dollar\n\n\nOMR\n\u0631.\u0639.\nOmani rial\n\n\nPAB\nB/.\nPanamanian balboa\n\n\nPEN\nS/\nSol\n\n\nPGK\nK\nPapua New Guinean kina')\n\n\nPHP\n\u20b1\nPhilippine peso\n\n\nPKR\n\u20a8\nPakistani rupee\n\n\nPLN\nz\u0142\nPolish z\u0142oty\n\n\nPRB\n\u0440.\nTransnistrian ruble\n\n\nPYG\n\u20b2\nParaguayan guaran\u00ed\n\n\nQAR\n\u0631.\u0642\nQatari riyal\n\n\nRON\nlei\nRomanian leu\n\n\nRSD\n\u0440\u0441\u0434\nSerbian dinar\n\n\nRUB\n\u20bd\nRussian ruble\n\n\nRWF\nFr\nRwandan franc\n\n\nSAR\n\u0631.\u0633\nSaudi riyal\n\n\nSBD\n$\nSolomon Islands dollar')\n\n\nSCR\n\u20a8\nSeychellois rupee\n\n\nSDG\n\u062c.\u0633.\nSudanese pound\n\n\nSEK\nkr\nSwedish krona\n\n\nSGD\n$\nSingapore dollar\n\n\nSHP\n\u00a3\nSaint Helena pound\n\n\nSLL\nLe\nSierra Leonean leone\n\n\nSOS\nSh\nSomali shilling\n\n\nSRD\n$\nSurinamese dollar\n\n\nSSP\n\u00a3\nSouth Sudanese pound\n\n\nSTN\nDb\nS\u00e3o Tom\u00e9 and Pr\u00edncipe d\n\n\nSYP\n\u0644.\u0633\nSyrian pound\n\n\nSZL\nL\nSwazi lilangeni\n\n\nTHB\n\u0e3f\nThai baht\n\n\nTJS\n\u0405\u041c\nTajikistani somoni\n\n\nTMT\nm\nTurkmenistan manat\n\n\nTND\n\u062f.\u062a\nTunisian dinar\n\n\nTOP\nT$\nTongan pa\u02bbanga\n\n\nTRY\n\u20ba\nTurkish lira\n\n\nTTD\n$\nTrinidad and Tobago doll\n\n\nTWD\nNT$\nNew Taiwan dollar\n\n\nTZS\nSh\nTanzanian shilling\n\n\nUAH\n\u20b4\nUkrainian hryvnia\n\n\nUGX\nUGX\nUgandan shilling\n\n\nUSD\n$\nUnited States (US) dolla\n\n\nUYU\n$\nUruguayan peso\n\n\nUZS\nUZS\nUzbekistani som\n\n\nVEF\nBs F\nVenezuelan bol\u00edvar\n\n\nVES\nBs.S\nBol\u00edvar soberano\n\n\nVND\n\u20ab\nVietnamese \u0111\u1ed3ng\n\n\nVUV\nVt\nVanuatu vatu\n\n\nWST\nT\nSamoan t\u0101l\u0101\n\n\nXAF\nCFA\nCentral African CFA fr\n\n\nXCD\n$\nEast Caribbean dollar\n\n\nXOF\nCFA\nWest African CFA franc\n\n\nXPF\nFr\nCFP franc\n\n\nYER\n\ufdfc\nYemeni rial\n\n\nZAR\nR\nSouth African rand\n\n\nZMW\nZK\nZambian kwacha\n\n\n" + } +] \ No newline at end of file diff --git a/test_unstructured/metrics/test_evaluate.py b/test_unstructured/metrics/test_evaluate.py new file mode 100644 index 000000000..8053d12ba --- /dev/null +++ b/test_unstructured/metrics/test_evaluate.py @@ -0,0 +1,36 @@ +import os +import pathlib + +import pytest + +from unstructured.metrics.evaluate import ( + measure_text_edit_distance, +) + +is_in_docker = os.path.exists("/.dockerenv") + +EXAMPLE_DOCS_DIRECTORY = os.path.join( + pathlib.Path(__file__).parent.resolve(), "..", "..", "example-docs" +) +TESTING_FILE_DIR = os.path.join(EXAMPLE_DOCS_DIRECTORY, "test_evaluate_files") + +UNSTRUCTURED_OUTPUT_DIRNAME = "unstructured_output" +GOLD_CCT_DIRNAME = "gold_standard_cct" + + +@pytest.mark.skipif(is_in_docker, reason="Skipping this test in Docker container") +def test_text_extraction_takes_list(): + output_dir = os.path.join(TESTING_FILE_DIR, UNSTRUCTURED_OUTPUT_DIRNAME) + output_list = ["currency.csv.json"] + source_dir = os.path.join(TESTING_FILE_DIR, GOLD_CCT_DIRNAME) + export_dir = os.path.join(TESTING_FILE_DIR, "test_evaluate_results_cct") + measure_text_edit_distance( + output_dir=output_dir, + source_dir=source_dir, + output_list=output_list, + export_dir=export_dir, + ) + # check that only the listed files are included + with open(os.path.join(export_dir, "all-docs-cct.tsv")) as f: + lines = f.read().splitlines() + assert len(lines) == len(output_list) + 1 # includes header diff --git a/test_unstructured_ingest/evaluation-metrics.sh b/test_unstructured_ingest/evaluation-metrics.sh index 2e4ee8f19..7b2eadfd1 100755 --- a/test_unstructured_ingest/evaluation-metrics.sh +++ b/test_unstructured_ingest/evaluation-metrics.sh @@ -12,9 +12,9 @@ mkdir -p "$OUTPUT_DIR" EVAL_NAME="$1" if [ "$EVAL_NAME" == "text-extraction" ]; then - METRIC_STRATEGY="measure-text-edit-distance" + METRIC_STRATEGY="measure-text-edit-distance-command" elif [ "$EVAL_NAME" == "element-type" ]; then - METRIC_STRATEGY="measure-element-type-accuracy" + METRIC_STRATEGY="measure-element-type-accuracy-command" else echo "Wrong metric evaluation strategy given. Expected one of [ text-extraction, element-type ]. Got [ $EVAL_NAME ]." exit 1 diff --git a/test_unstructured_ingest/metrics/aggregate-scores-cct.tsv b/test_unstructured_ingest/metrics/aggregate-scores-cct.tsv index 0bba1b63a..971d0c650 100644 --- a/test_unstructured_ingest/metrics/aggregate-scores-cct.tsv +++ b/test_unstructured_ingest/metrics/aggregate-scores-cct.tsv @@ -1,3 +1,3 @@ strategy average sample_sd population_sd count -cct-accuracy 0.798 0.083 0.072 4 -cct-%missing 0.089 0.04 0.035 4 \ No newline at end of file +cct-accuracy 0.735 0.069 0.048 2 +cct-%missing 0.086 0.069 0.049 2 diff --git a/test_unstructured_ingest/metrics/all-docs-cct.tsv b/test_unstructured_ingest/metrics/all-docs-cct.tsv index 3efe86ea5..35124f7f1 100644 --- a/test_unstructured_ingest/metrics/all-docs-cct.tsv +++ b/test_unstructured_ingest/metrics/all-docs-cct.tsv @@ -1,3 +1,3 @@ -filename connector cct-accuracy cct-%missing -example-10k.html local 0.686 0.037 -IRS-form-1987.pdf azure 0.783 0.135 \ No newline at end of file +filename doctype connector cct-accuracy cct-%missing +IRS-form-1987.pdf pdf azure 0.783 0.135 +example-10k.html html local 0.686 0.037 diff --git a/unstructured/__version__.py b/unstructured/__version__.py index 0f5c65089..d29d8730b 100644 --- a/unstructured/__version__.py +++ b/unstructured/__version__.py @@ -1 +1 @@ -__version__ = "0.10.29" # pragma: no cover +__version__ = "0.10.30-dev0" # pragma: no cover diff --git a/unstructured/ingest/evaluate.py b/unstructured/ingest/evaluate.py index ef8f72d17..7b055cf4f 100755 --- a/unstructured/ingest/evaluate.py +++ b/unstructured/ingest/evaluate.py @@ -1,35 +1,10 @@ #! /usr/bin/env python3 -import csv -import logging -import os -import statistics -import sys -from typing import Any, List, Optional, Tuple +from typing import List, Optional, Tuple import click -from unstructured.metrics.element_type import ( - calculate_element_type_percent_match, - get_element_type_frequency, -) -from unstructured.metrics.text_extraction import calculate_accuracy, calculate_percent_missing_text -from unstructured.staging.base import elements_from_json, elements_to_text - -logger = logging.getLogger("unstructured.ingest") -handler = logging.StreamHandler() -handler.name = "ingest_log_handler" -formatter = logging.Formatter("%(asctime)s %(processName)-10s %(levelname)-8s %(message)s") -handler.setFormatter(formatter) - -# Only want to add the handler once -if "ingest_log_handler" not in [h.name for h in logger.handlers]: - logger.addHandler(handler) - -logger.setLevel(logging.DEBUG) - - -agg_headers = ["strategy", "average", "sample_sd", "population_sd", "count"] +from unstructured.metrics.evaluate import measure_element_type_accuracy, measure_text_edit_distance @click.group() @@ -39,6 +14,7 @@ def main(): @main.command() @click.option("--output_dir", type=str, help="Directory to structured output.") +@click.option("--source_dir", type=str, help="Directory to source.") @click.option( "--output_list", type=str, @@ -46,7 +22,6 @@ def main(): help="Optional: list of selected structured output file names under the \ directory to be evaluate. If none, all files under directory will be use.", ) -@click.option("--source_dir", type=str, help="Directory to source.") @click.option( "--source_list", type=str, @@ -69,80 +44,22 @@ def main(): help="A tuple of weights to the Levenshtein distance calculation. \ See text_extraction.py/calculate_edit_distance for more details.", ) -def measure_text_edit_distance( +def measure_text_edit_distance_command( output_dir: str, - output_list: Optional[List[str]], source_dir: str, + output_list: Optional[List[str]], source_list: Optional[List[str]], export_dir: str, weights: Tuple[int, int, int], -) -> None: - """ - Loops through the list of structured output from all of `output_dir` or selected files from - `output_list`, and compare with gold-standard of the same file name under `source_dir` or - selected files from `source_list`. - - Calculates text accuracy and percent missing. After looped through the whole list, write to tsv. - Also calculates the aggregated accuracy and percent missing. - """ - if not output_list: - output_list = _listdir_recursive(output_dir) - if not source_list: - source_list = _listdir_recursive(source_dir) - - if not output_list: - print("No output files to calculate to edit distances for, exiting") - sys.exit(0) - - rows = [] - accuracy_scores: List[float] = [] - percent_missing_scores: List[float] = [] - - # assumption: output file name convention is name-of-file.doc.json - for doc in output_list: # type: ignore - fn = (doc.split("/")[-1]).split(".json")[0] - doctype = fn.rsplit(".", 1)[-1] - fn_txt = fn + ".txt" - connector = doc.split("/")[0] - - if fn_txt in source_list: # type: ignore - output_cct = elements_to_text(elements_from_json(os.path.join(output_dir, doc))) - source_cct = _read_text(os.path.join(source_dir, fn_txt)) - accuracy = round(calculate_accuracy(output_cct, source_cct, weights), 3) - percent_missing = round(calculate_percent_missing_text(output_cct, source_cct), 3) - - rows.append([fn, doctype, connector, accuracy, percent_missing]) - accuracy_scores.append(accuracy) - percent_missing_scores.append(percent_missing) - - headers = ["filename", "doctype", "connector", "cct-accuracy", "cct-%missing"] - _write_to_file(export_dir, "all-docs-cct.tsv", rows, headers) - - agg_rows = [] - agg_rows.append( - [ - "cct-accuracy", - _mean(accuracy_scores), - _stdev(accuracy_scores), - _pstdev(accuracy_scores), - len(accuracy_scores), - ], +): + return measure_text_edit_distance( + output_dir, source_dir, output_list, source_list, export_dir, weights ) - agg_rows.append( - [ - "cct-%missing", - _mean(percent_missing_scores), - _stdev(percent_missing_scores), - _pstdev(percent_missing_scores), - len(percent_missing_scores), - ], - ) - _write_to_file(export_dir, "aggregate-scores-cct.tsv", agg_rows, agg_headers) - _display(agg_rows, agg_headers) @main.command() @click.option("--output_dir", type=str, help="Directory to structured output.") +@click.option("--source_dir", type=str, help="Directory to structured source.") @click.option( "--output_list", type=str, @@ -150,7 +67,6 @@ def measure_text_edit_distance( help="Optional: list of selected structured output file names under the \ directory to be evaluate. If none, all files under directory will be used.", ) -@click.option("--source_dir", type=str, help="Directory to structured source.") @click.option( "--source_list", type=str, @@ -165,132 +81,16 @@ def measure_text_edit_distance( help="Directory to save the output evaluation metrics to. Default to \ your/working/dir/metrics/", ) -def measure_element_type_accuracy( +def measure_element_type_accuracy_command( output_dir: str, - output_list: Optional[List[str]], source_dir: str, + output_list: Optional[List[str]], source_list: Optional[List[str]], export_dir: str, ): - """ - Loops through the list of structured output from all of `output_dir` or selected files from - `output_list`, and compare with gold-standard of the same file name under `source_dir` or - selected files from `source_list`. - - Calculates element type frequency accuracy and percent missing. After looped through the - whole list, write to tsv. Also calculates the aggregated accuracy. - """ - if not output_list: - output_list = _listdir_recursive(output_dir) - if not source_list: - source_list = _listdir_recursive(source_dir) - - rows = [] - accuracy_scores: List[float] = [] - - for doc in output_list: # type: ignore - fn = (doc.split("/")[-1]).split(".json")[0] - doctype = fn.rsplit(".", 1)[-1] - connector = doc.split("/")[0] - if doc in source_list: # type: ignore - output = get_element_type_frequency(_read_text(os.path.join(output_dir, doc))) - source = get_element_type_frequency(_read_text(os.path.join(source_dir, doc))) - accuracy = round(calculate_element_type_percent_match(output, source), 3) - rows.append([fn, doctype, connector, accuracy]) - accuracy_scores.append(accuracy) - - headers = ["filename", "doctype", "connector", "element-type-accuracy"] - _write_to_file(export_dir, "all-docs-element-type-frequency.tsv", rows, headers) - - agg_rows = [] - agg_rows.append( - [ - "element-type-accuracy", - _mean(accuracy_scores), - _stdev(accuracy_scores), - _pstdev(accuracy_scores), - len(accuracy_scores), - ], + return measure_element_type_accuracy( + output_dir, source_dir, output_list, source_list, export_dir ) - _write_to_file(export_dir, "aggregate-scores-element-type.tsv", agg_rows, agg_headers) - _display(agg_rows, agg_headers) - - -def _listdir_recursive(dir: str): - listdir = [] - for dirpath, _, filenames in os.walk(dir): - for filename in filenames: - # Remove the starting directory from the path to show the relative path - relative_path = os.path.relpath(dirpath, dir) - if relative_path == ".": - listdir.append(filename) - else: - listdir.append(f"{relative_path}/{filename}") - return listdir - - -def _write_to_file(dir: str, filename: str, rows: List[Any], headers: List[Any], mode: str = "w"): - if mode not in ["w", "a"]: - raise ValueError("Mode not supported. Mode must be one of [w, a].") - if dir and not os.path.exists(dir): - os.makedirs(dir) - with open(os.path.join(os.path.join(dir, filename)), mode, newline="") as tsv: - writer = csv.writer(tsv, delimiter="\t") - if mode == "w": - writer.writerow(headers) - writer.writerows(rows) - - -def _display(rows, headers): - col_widths = [ - max(len(headers[i]), max(len(str(row[i])) for row in rows)) for i in range(len(headers)) - ] - click.echo(" ".join(headers[i].ljust(col_widths[i]) for i in range(len(headers)))) - click.echo("-" * sum(col_widths) + "-" * (len(headers) - 1)) - for row in rows: - formatted_row = [] - for item in row: - if isinstance(item, float): - formatted_row.append(f"{item:.3f}") - else: - formatted_row.append(str(item)) - click.echo( - " ".join(formatted_row[i].ljust(col_widths[i]) for i in range(len(formatted_row))), - ) - - -def _mean(scores: List[float], rounding: Optional[int] = 3): - if len(scores) < 1: - return None - elif len(scores) == 1: - mean = scores[0] - else: - mean = statistics.mean(scores) - if not rounding: - return mean - return round(mean, rounding) - - -def _stdev(scores: List[float], rounding: Optional[int] = 3): - if len(scores) <= 1: - return None - if not rounding: - return statistics.stdev(scores) - return round(statistics.stdev(scores), rounding) - - -def _pstdev(scores: List[float], rounding: Optional[int] = 3): - if len(scores) <= 1: - return None - if not rounding: - return statistics.pstdev(scores) - return round(statistics.pstdev(scores), rounding) - - -def _read_text(path): - with open(path, errors="ignore") as f: - text = f.read() - return text if __name__ == "__main__": diff --git a/unstructured/metrics/evaluate.py b/unstructured/metrics/evaluate.py new file mode 100755 index 000000000..f1dd86be0 --- /dev/null +++ b/unstructured/metrics/evaluate.py @@ -0,0 +1,232 @@ +#! /usr/bin/env python3 + +import csv +import logging +import os +import statistics +import sys +from typing import Any, List, Optional, Tuple + +import click + +from unstructured.metrics.element_type import ( + calculate_element_type_percent_match, + get_element_type_frequency, +) +from unstructured.metrics.text_extraction import calculate_accuracy, calculate_percent_missing_text +from unstructured.staging.base import elements_from_json, elements_to_text + +logger = logging.getLogger("unstructured.ingest") +handler = logging.StreamHandler() +handler.name = "ingest_log_handler" +formatter = logging.Formatter("%(asctime)s %(processName)-10s %(levelname)-8s %(message)s") +handler.setFormatter(formatter) + +# Only want to add the handler once +if "ingest_log_handler" not in [h.name for h in logger.handlers]: + logger.addHandler(handler) + +logger.setLevel(logging.DEBUG) + + +agg_headers = ["strategy", "average", "sample_sd", "population_sd", "count"] + + +def measure_text_edit_distance( + output_dir: str, + source_dir: str, + output_list: Optional[List[str]] = None, + source_list: Optional[List[str]] = None, + export_dir: str = "metrics", + weights: Tuple[int, int, int] = (2, 1, 1), +) -> None: + """ + Loops through the list of structured output from all of `output_dir` or selected files from + `output_list`, and compare with gold-standard of the same file name under `source_dir` or + selected files from `source_list`. + + Calculates text accuracy and percent missing. After looped through the whole list, write to tsv. + Also calculates the aggregated accuracy and percent missing. + """ + if not output_list: + output_list = _listdir_recursive(output_dir) + if not source_list: + source_list = _listdir_recursive(source_dir) + + if not output_list: + print("No output files to calculate to edit distances for, exiting") + sys.exit(0) + + rows = [] + accuracy_scores: List[float] = [] + percent_missing_scores: List[float] = [] + + # assumption: output file name convention is name-of-file.doc.json + for doc in output_list: # type: ignore + fn = (doc.split("/")[-1]).split(".json")[0] + doctype = fn.rsplit(".", 1)[-1] + fn_txt = fn + ".txt" + connector = doc.split("/")[0] + + if fn_txt in source_list: # type: ignore + output_cct = elements_to_text(elements_from_json(os.path.join(output_dir, doc))) + source_cct = _read_text(os.path.join(source_dir, fn_txt)) + accuracy = round(calculate_accuracy(output_cct, source_cct, weights), 3) + percent_missing = round(calculate_percent_missing_text(output_cct, source_cct), 3) + + rows.append([fn, doctype, connector, accuracy, percent_missing]) + accuracy_scores.append(accuracy) + percent_missing_scores.append(percent_missing) + + headers = ["filename", "doctype", "connector", "cct-accuracy", "cct-%missing"] + _write_to_file(export_dir, "all-docs-cct.tsv", rows, headers) + + agg_rows = [] + agg_rows.append( + [ + "cct-accuracy", + _mean(accuracy_scores), + _stdev(accuracy_scores), + _pstdev(accuracy_scores), + len(accuracy_scores), + ], + ) + agg_rows.append( + [ + "cct-%missing", + _mean(percent_missing_scores), + _stdev(percent_missing_scores), + _pstdev(percent_missing_scores), + len(percent_missing_scores), + ], + ) + _write_to_file(export_dir, "aggregate-scores-cct.tsv", agg_rows, agg_headers) + _display(agg_rows, agg_headers) + + +def measure_element_type_accuracy( + output_dir: str, + source_dir: str, + output_list: Optional[List[str]] = None, + source_list: Optional[List[str]] = None, + export_dir: str = "metrics", +): + """ + Loops through the list of structured output from all of `output_dir` or selected files from + `output_list`, and compare with gold-standard of the same file name under `source_dir` or + selected files from `source_list`. + + Calculates element type frequency accuracy and percent missing. After looped through the + whole list, write to tsv. Also calculates the aggregated accuracy. + """ + if not output_list: + output_list = _listdir_recursive(output_dir) + if not source_list: + source_list = _listdir_recursive(source_dir) + + rows = [] + accuracy_scores: List[float] = [] + + for doc in output_list: # type: ignore + fn = (doc.split("/")[-1]).split(".json")[0] + doctype = fn.rsplit(".", 1)[-1] + connector = doc.split("/")[0] + if doc in source_list: # type: ignore + output = get_element_type_frequency(_read_text(os.path.join(output_dir, doc))) + source = get_element_type_frequency(_read_text(os.path.join(source_dir, doc))) + accuracy = round(calculate_element_type_percent_match(output, source), 3) + rows.append([fn, doctype, connector, accuracy]) + accuracy_scores.append(accuracy) + + headers = ["filename", "doctype", "connector", "element-type-accuracy"] + _write_to_file(export_dir, "all-docs-element-type-frequency.tsv", rows, headers) + + agg_rows = [] + agg_rows.append( + [ + "element-type-accuracy", + _mean(accuracy_scores), + _stdev(accuracy_scores), + _pstdev(accuracy_scores), + len(accuracy_scores), + ], + ) + _write_to_file(export_dir, "aggregate-scores-element-type.tsv", agg_rows, agg_headers) + _display(agg_rows, agg_headers) + + +def _listdir_recursive(dir: str): + listdir = [] + for dirpath, _, filenames in os.walk(dir): + for filename in filenames: + # Remove the starting directory from the path to show the relative path + relative_path = os.path.relpath(dirpath, dir) + if relative_path == ".": + listdir.append(filename) + else: + listdir.append(f"{relative_path}/{filename}") + return listdir + + +def _display(rows, headers): + col_widths = [ + max(len(headers[i]), max(len(str(row[i])) for row in rows)) for i in range(len(headers)) + ] + click.echo(" ".join(headers[i].ljust(col_widths[i]) for i in range(len(headers)))) + click.echo("-" * sum(col_widths) + "-" * (len(headers) - 1)) + for row in rows: + formatted_row = [] + for item in row: + if isinstance(item, float): + formatted_row.append(f"{item:.3f}") + else: + formatted_row.append(str(item)) + click.echo( + " ".join(formatted_row[i].ljust(col_widths[i]) for i in range(len(formatted_row))), + ) + + +def _write_to_file(dir: str, filename: str, rows: List[Any], headers: List[Any], mode: str = "w"): + if mode not in ["w", "a"]: + raise ValueError("Mode not supported. Mode must be one of [w, a].") + if dir and not os.path.exists(dir): + os.makedirs(dir) + with open(os.path.join(os.path.join(dir, filename)), mode, newline="") as tsv: + writer = csv.writer(tsv, delimiter="\t") + if mode == "w": + writer.writerow(headers) + writer.writerows(rows) + + +def _mean(scores: List[float], rounding: Optional[int] = 3): + if len(scores) < 1: + return None + elif len(scores) == 1: + mean = scores[0] + else: + mean = statistics.mean(scores) + if not rounding: + return mean + return round(mean, rounding) + + +def _stdev(scores: List[float], rounding: Optional[int] = 3): + if len(scores) <= 1: + return None + if not rounding: + return statistics.stdev(scores) + return round(statistics.stdev(scores), rounding) + + +def _pstdev(scores: List[float], rounding: Optional[int] = 3): + if len(scores) <= 1: + return None + if not rounding: + return statistics.pstdev(scores) + return round(statistics.pstdev(scores), rounding) + + +def _read_text(path): + with open(path, errors="ignore") as f: + text = f.read() + return text