unstructured/test_unstructured
Yao You 911f9983c1
feat: redefine table level acc (#2620)
This PR redefines the `table_level_acc` metric as follow:
- for each predicted table use sequence matching ratio as its accuracy
- as a prerequisite for the sequence matching we sort the table cells by
row then column for both predicted and ground truth to ensure they are
ordered the same
- average all predicted table accuracy
- any prediction without a matching ground truth (false positive) would
decrease the score
- prediction that splits ground truth into smaller tables would also
have low score with perfectly equal splits having lowest score

This new definition makes the new metric a value between 0 and 1 per
file. This replaces the existing definition where the metric is defined
as (the number of predicted table that has a match to ground truth) to
(the number of ground truth table). This existing metric actually gives
higher values for predictions that splits tables and can be higher than
1. The new definition prefers predictions that do not split ground truth
tables.
2024-03-08 17:00:57 +00:00
..