unstructured/test_unstructured_ingest
Yao You cd8c6a2e09
fix: occasional SIGABRT with deltalake writer on Linux (#1567)
- resolves an issue where occasionally deltalake writer results in
SIGABRT event though the writer finished writing table properly on linux
- this is first observed in ingest test
- Putting the writer into a process mitigates this problem by forcing
python to finish the deltalake rust backend to finish its tasks

## test

To test this it is best to setup an instance on a Linux system since the
problem has only been observed on Linux so far. Run

```bash
PYTHONPATH=. ./unstructured/ingest/main.py delta-table --num-processes 2 --metadata-exclude coordinates,filename,file_directory,metadata.data_source.date_processed,metadata.last_modified,metadata.date_created,metadata.detection_class_prob,metadata.parent_id,metadata.category_depth --table-uri ../tables/delta/ --preserve-downloads --verbose delta-table --write-column json_data --mode overwrite --table-uri file:///tmp/delta
```

Without this fix occasionally we'd encounter `SIGABTR`.

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
2023-09-29 02:41:18 +00:00
..