Klaijan e65a44eabb
feat: update cct eval for text dir (#2299)
The code makes edit to the `measure_text_extraction_accuracy` function
to allows dir of txt as well as json. The function also takes input
`output_type` to be either "json" or "txt" only, and checks if the files
under given directory/list contains only specified file type or not.

To test this feature, run the following code:

```PYTHONPATH=. python unstructured/ingest/evaluate.py measure-text-extraction-accuracy-command --output_dir <clean-text-path> --source_dir <cct-label-path> --output_type txt```
2024-01-05 23:34:53 +00:00

25 lines
1.6 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Bank Good Credit
Accredited with IABAC™
( International Association of Business Analytics Certifications)`
© DataMites™. All Rights Reserved | www.datamites.com
Objective & Background
Classify credit card customers as good / bad, based on information from internal and external sources.
Data provided
Demographic: Base file of with credit card history details. Only one record for every customer.
Account: Contians data for various loans availed by the customer. Not related to credit card. Multiple records for every customer.
Enquiries: Enquired made by customers for different loan purposes. Multiple records for every customer.
© DataMites™. All Rights Reserved | www.datamites.com
Design
Data to be downloaded using SQL queries.
Required information to be extracted from Account and Enquiry files and converted to one-to-one files.
The columns from the two files should be merged with Demographic file using Left Join with “customer no” as key column, to create a final file. The final file should contain all the records in demographic and additional columns/features from Account and Enquiry files will get added to Demographic file.
There will be many customers in account and enquiry file who will get left out. This is fine as we anyway dont know their good/bad label for training purpose.
© DataMites™. All Rights Reserved | www.datamites.com
Analysis of Data
Show using Excel File
© DataMites™. All Rights Reserved | www.datamites.com
Explain Coding / outcomes
Show using Jupyter
© DataMites™. All Rights Reserved | www.datamites.com
Thank You
© DataMites™. All Rights Reserved | www.datamites.com