unstructured

yujunjun/unstructured

Fork 0

mirror of https://github.com/Unstructured-IO/unstructured.git synced 2025-07-06 08:31:46 +00:00

Commit Graph

Author	SHA1	Message	Date
Yao You	911f9983c1	feat: redefine table level acc (#2620 ) This PR redefines the `table_level_acc` metric as follow: - for each predicted table use sequence matching ratio as its accuracy - as a prerequisite for the sequence matching we sort the table cells by row then column for both predicted and ground truth to ensure they are ordered the same - average all predicted table accuracy - any prediction without a matching ground truth (false positive) would decrease the score - prediction that splits ground truth into smaller tables would also have low score with perfectly equal splits having lowest score This new definition makes the new metric a value between 0 and 1 per file. This replaces the existing definition where the metric is defined as (the number of predicted table that has a match to ground truth) to (the number of ground truth table). This existing metric actually gives higher values for predictions that splits tables and can be higher than 1. The new definition prefers predictions that do not split ground truth tables.	2024-03-08 17:00:57 +00:00
Pawel Kmiecik	e35306cfc7	fix: table evaluation metrics fix calculations when no tables found in predictions (#2619 ) The current way table structure metrics are computed does not cover cases when none table is found and all stats are empty. This PR fixes this + adds some hardenning tests for table eval processor. --------- Co-authored-by: Yao You <theyaoyou@gmail.com>	2024-03-07 18:39:19 +00:00
Yao You	42f8cf1997	chore: add metric helper for table structure eval (#1877 ) - add helper to run inference over an image or pdf of table and compare it against a ground truth csv file - this metric generates a similarity score between 1 and 0, where 1 is perfect match and 0 is no match at all - add example docs for testing - NOTE: this metric is only relevant to table structure detection. Therefore the input should be just the table area in an image/pdf file; we are not evaluating table element detection in this metric	2023-10-27 13:23:44 -05:00

Author

SHA1

Message

Date

Yao You

911f9983c1

feat: redefine table level acc (#2620 )

This PR redefines the `table_level_acc` metric as follow:
- for each predicted table use sequence matching ratio as its accuracy
- as a prerequisite for the sequence matching we sort the table cells by
row then column for both predicted and ground truth to ensure they are
ordered the same
- average all predicted table accuracy
- any prediction without a matching ground truth (false positive) would
decrease the score
- prediction that splits ground truth into smaller tables would also
have low score with perfectly equal splits having lowest score

This new definition makes the new metric a value between 0 and 1 per
file. This replaces the existing definition where the metric is defined
as (the number of predicted table that has a match to ground truth) to
(the number of ground truth table). This existing metric actually gives
higher values for predictions that splits tables and can be higher than
1. The new definition prefers predictions that do not split ground truth
tables.

2024-03-08 17:00:57 +00:00

Pawel Kmiecik

e35306cfc7

fix: table evaluation metrics fix calculations when no tables found in predictions (#2619 )

The current way table structure metrics are computed does not cover
cases when none table is found and all stats are empty.

This PR fixes this + adds some hardenning tests for table eval
processor.

---------

Co-authored-by: Yao You <theyaoyou@gmail.com>

2024-03-07 18:39:19 +00:00

Yao You

42f8cf1997

chore: add metric helper for table structure eval (#1877 )

- add helper to run inference over an image or pdf of table and compare
it against a ground truth csv file
- this metric generates a similarity score between 1 and 0, where 1 is
perfect match and 0 is no match at all
- add example docs for testing
- NOTE: this metric is only relevant to table structure detection.
Therefore the input should be just the table area in an image/pdf file;
we are not evaluating table element detection in this metric

2023-10-27 13:23:44 -05:00

3 Commits