feat: Add deprecation warning on import of any ingest code (#3443)

### Description
Any time `unstructed.ingest` is imported, this deprecation warning gets
emitted:
```
DeprecationWarning: unstructured.ingest will be removed in a future version
```
This commit is contained in:
Roman Isecke 2024-07-30 11:06:21 -04:00 committed by GitHub
parent 4e61acc1c6
commit 482f093afb
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 20 additions and 3 deletions

View File

@ -1,9 +1,11 @@
## 0.15.1-dev5
## 0.15.1-dev6
### Enhancements
### Features
* **Mark ingest as deprecated** Begin sunset of ingest code in this repo as it's been moved to a dedicated repo.
### Fixes
* **Update `HuggingFaceEmbeddingEncoder` to use `HuggingFaceEmbeddings` from `langchain_huggingface` package instead of the deprecated version from `langchain-community`.** This resolves the deprecation warning and ensures compatibility with future versions of langchain.
@ -20,7 +22,7 @@
### Enhancements
* **Improve text clearing process in email partitioning.** Updated the email partitioner to remove both `=\n` and `=\r\n` characters during the clearing process. Previously, only `=\n` characters were removed.
* **Improve text clearing process in email partitioning.** Updated the email partitioner to remove both `=\n` and `=\r\n` characters during the clearing process. Previously, only `=\n` characters were removed.
* **Bump unstructured.paddleocr to 2.8.0.1.**
* **Refine HTML parser to accommodate block element nested in phrasing.** HTML parser no longer raises on a block element (e.g. `<p>`, `<div>`) nested inside a phrasing element (e.g. `<strong>` or `<cite>`). Instead it breaks the phrasing run (and therefore element) at the block-item start and begins a new phrasing run after the block-item. This is consistent with how the browser determines element boundaries in this situation.
* **Install rewritten HTML parser to fix 12 existing bugs and provide headroom for refinement and growth.** A rewritten HTML parser resolves a collection of outstanding bugs with HTML partitioning and provides a firm foundation for further elaborating that important partitioner.

View File

@ -64,6 +64,7 @@ tests_to_ignore=(
'notion.sh'
'dropbox.sh'
'sharepoint.sh'
'databricks-volumes.sh'
)
for test in "${all_tests[@]}"; do

View File

@ -1 +1 @@
__version__ = "0.15.1-dev5" # pragma: no cover
__version__ = "0.15.1-dev6" # pragma: no cover

View File

@ -1,3 +1,8 @@
# Ingest
![Project unmaintained](https://img.shields.io/badge/project-unmaintained-red.svg)
Project has been moved to: [Unstructured Ingest](https://github.com/Unstructured-IO/unstructured-ingest)
# Batch Processing Documents [DEPRECATED]
For the latest approach, go to: [v2](./v2)

View File

@ -1 +1,10 @@
from __future__ import annotations
import warnings
warnings.warn(
"unstructured.ingest will be removed in a future version. "
"Functionality moved to the unstructured-ingest project.",
DeprecationWarning,
stacklevel=2,
)