Steve Canny 0c7f64ecaa
rfctr(chunking): generalize PreChunkBuilder (#2283)
To implement inter-pre-chunk overlap, we need a context that sees every
pre-chunk both before and after it is accumulated (from elements).

- We need access to the pre-chunk when it is completed so we can extract
the "tail" overlap to be applied to the next chunk.
- We need access to the as-yet-unpopulated pre-chunk so we can add the
prior tail to it as a prefix.

This "visibility" is split between `PreChunkBuilder` and the pre-chunker
itself, which handles `TablePreChunk`s without the builder.

Move `Table` element and TablePreChunk` formation into `PreChunkBuilder`
such that _all_ element types (adding `Table` elements in particular)
pass through it. Then `PreChunkBuilder` becomes the context we require.

The actual overlap harvesting and application will come in a subsequent
commit.
2023-12-18 22:21:34 +00:00
..