Amanda Cameron f98d5e65ca
chore: adding max_characters to other element type chunking (#1673)
This PR adds the `max_characters` (hard max) param to non-table element
chunking. Additionally updates the `num_characters` metadata to
`max_characters` to make it clearer which param we're referencing.

To test:

```
from unstructured.partition.html import partition_html

filename = "example-docs/example-10k-1p.html"
chunk_elements = partition_html(
        filename,
        chunking_strategy="by_title",
        combine_text_under_n_chars=0,
        new_after_n_chars=50,
        max_characters=100,
    )

for chunk in chunk_elements:
     print(len(chunk.text))

# previously we were only respecting the "soft max" (default of 500) for elements other than tables
# now we should see that all the elements have text fields under 100 chars.
```

---------

Co-authored-by: cragwolfe <crag@unstructured.io>
2023-10-09 19:42:36 +00:00
..
2023-09-20 00:02:03 -07:00
2023-09-13 23:37:31 -07:00
2023-09-09 18:54:01 -07:00
2023-09-13 23:37:31 -07:00
2023-09-26 04:20:16 +00:00
2023-08-21 10:27:32 -07:00
2023-08-21 10:27:32 -07:00