2 Commits

Author SHA1 Message Date
Matt Robinson
ad69bdcd4e
build(deps): deltalake bump to 0.18.x (#3197)
### Summary

Closes #3173. Removes the `overwrite_schema` kwarg from the Delta Table
connector and bumps the `deltalake` version. Per [this
PR](https://github.com/delta-io/delta-rs/pull/2554) in the `deltalake`
repo, the `overwrite_schema` kwarg is deprecated as of version `0.18.0`.
Users can specify `schema_mode="merge"` to obtain the same behavior.

- `schema_mode="merge"` is equivalent to `overwrite_schema=False`
- `schema_mode="overwrite"` is equivalent to `overwrite_schema=True`

Also adds an `engine` parameter that you can use to set `"rust"` or
`"pyarrow"` as the engine. `engine` defaults to `"pyarrow"` and
`schema_mode` defaults to `None`, which is consistent with the behavior
in `deltalake` documented
[here](https://delta-io.github.io/delta-rs/api/delta_writer/).

### Testing

The Delta Table ingest tests should pass on this PR.

---------

Co-authored-by: Ahmet Melek <39141206+ahmetmeleq@users.noreply.github.com>
2024-06-13 15:59:34 +00:00
Matt Robinson
612905e311
build: wolfi base image for Dockerfile (#3016)
### Summary

Updates the `Dockerfile` to use the Chainguard `wolfi-base` image to
reduce CVEs. Also adds a step in the docker publish job that scans the
images and checks for CVEs before publishing. The job will fail if there
are high or critical vulnerabilities.

### Testing

Run `make docker-run-dev` and then `python3.11` once you're in. And that
point, you can try:

```python
from unstructured.partition.auto import partition
elements = partition(filename="example-docs/DA-1p.pdf", skip_infer_table_types=["pdf"])
elements
```

Stop the container once you're done.
2024-05-15 22:53:15 +00:00