mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-09-25 16:29:53 +00:00

### Summary Closes #3173. Removes the `overwrite_schema` kwarg from the Delta Table connector and bumps the `deltalake` version. Per [this PR](https://github.com/delta-io/delta-rs/pull/2554) in the `deltalake` repo, the `overwrite_schema` kwarg is deprecated as of version `0.18.0`. Users can specify `schema_mode="merge"` to obtain the same behavior. - `schema_mode="merge"` is equivalent to `overwrite_schema=False` - `schema_mode="overwrite"` is equivalent to `overwrite_schema=True` Also adds an `engine` parameter that you can use to set `"rust"` or `"pyarrow"` as the engine. `engine` defaults to `"pyarrow"` and `schema_mode` defaults to `None`, which is consistent with the behavior in `deltalake` documented [here](https://delta-io.github.io/delta-rs/api/delta_writer/). ### Testing The Delta Table ingest tests should pass on this PR. --------- Co-authored-by: Ahmet Melek <39141206+ahmetmeleq@users.noreply.github.com>
24 lines
464 B
Bash
Executable File
24 lines
464 B
Bash
Executable File
#!/bin/bash
|
|
|
|
files=(
|
|
"libreoffice-7.6.5-r0.apk"
|
|
"libreoffice-24-24.2.4.1-r0.67f8e014.apk"
|
|
"openjpeg-2.5.0-r0.apk"
|
|
"poppler-23.09.0-r0.apk"
|
|
"leptonica-1.83.0-r0.apk"
|
|
"pandoc-3.1.8-r0.apk"
|
|
"tesseract-5.3.2-r0.apk"
|
|
"nltk_data.tgz"
|
|
|
|
)
|
|
|
|
directory="docker-packages"
|
|
mkdir -p "${directory}"
|
|
|
|
for file in "${files[@]}"; do
|
|
echo "Downloading ${file}"
|
|
wget "https://utic-public-cf.s3.amazonaws.com/$file" -P "$directory"
|
|
done
|
|
|
|
echo "Downloads complete."
|