2 Commits

Author SHA1 Message Date
Pluto
5bb95b5841
Fix parsing table cells (#3904)
This PR:
- Fixes removing HTML tags that exist in <td> cells 
- stripping function was in general problematic to implement in easy and
straightforward way (you can't modify `descendants` in-place). So I
decided instead of patching something in table cell I added stripping
everywhere in the same consistent way. This is why some tests needed
small edits with removing one white-space in each tag. I believe this
won't cause any problems for downstream tasks.

Tested HTML:
```html
<table class="Table">
    <tbody>
        <tr>
            <td colspan="2">
                Some text                                        
            </td>
            <td>
                <input checked="" class="Checkbox" type="checkbox"/>
            </td>
        </tr>
    </tbody>
</table>
```
Before & After
```html
'<table class="Table" id="..."> <tbody> <tr> <td colspan="2">Some text</td><td></td></tr></tbody></table>'
'<table class="Table" id="..."><tbody><tr><td colspan="2">Some text</td><td><input checked="" type="checkbox"/></td></tr></tbody></table>''
```
2025-02-05 15:28:49 +00:00
Pluto
e48d79eca1
image alt support (#3797) 2024-11-26 16:20:23 +00:00