mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-06-27 02:30:08 +00:00
Migrate to modern bs4 interface (#4025)
## PR Summary This small PR fixes the bs4 deprecation warnings which you can find in the [CI logs](https://github.com/Unstructured-IO/unstructured/actions/runs/15491657572/job/43729960936#step:3:2615): ```python /app/unstructured/metrics/table/table_extraction.py:53: DeprecationWarning: Call to deprecated method findAll. (Replaced by find_all) -- Deprecated since version 4.0.0. /app/unstructured/metrics/table/table_extraction.py:57: DeprecationWarning: Call to deprecated method findAll. (Replaced by find_all) -- Deprecated since version 4.0.0. ``` --------- Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
This commit is contained in:
parent
6ef2fc1ec6
commit
531490d013
@ -1,4 +1,4 @@
|
||||
## 0.17.11-dev2
|
||||
## 0.17.11-dev3
|
||||
|
||||
### Enhancements
|
||||
|
||||
@ -8,6 +8,7 @@
|
||||
- Fix type error when `result_file_type` is expected to be a `FileType` but is `None`
|
||||
- Fix chunking for elements with None text that has AttributeError 'NoneType' object has no attribute 'strip'.
|
||||
- Invalid elements IDs are not visible in VLM output. Parent-child hierarchy is now retrieved based on unstructured element ID, instead of id injected into HTML code of element.
|
||||
- Fix bs4 deprecation warnings by updating `findAll()` with `find_all()`.
|
||||
|
||||
## 0.17.10
|
||||
- Drop Python 3.9 support as it reaches EOL in October 2025
|
||||
|
@ -1 +1 @@
|
||||
__version__ = "0.17.11-dev2" # pragma: no cover
|
||||
__version__ = "0.17.11-dev3" # pragma: no cover
|
||||
|
@ -50,11 +50,11 @@ def html_table_to_deckerd(content: str) -> List[Dict[str, Any]]:
|
||||
|
||||
soup = BeautifulSoup(content, "html.parser")
|
||||
table = soup.find("table")
|
||||
rows = table.findAll(["tr"])
|
||||
rows = table.find_all(["tr"])
|
||||
table_data = []
|
||||
|
||||
for i, row in enumerate(rows):
|
||||
cells = row.findAll(["th", "td"])
|
||||
cells = row.find_all(["th", "td"])
|
||||
for j, cell_data in enumerate(cells):
|
||||
cell = {
|
||||
"y": i,
|
||||
|
Loading…
x
Reference in New Issue
Block a user