This website requires JavaScript.
Explore
Help
Register
Sign In
yujunjun
/
unstructured
Watch
1
Star
0
Fork
0
You've already forked unstructured
mirror of
https://github.com/Unstructured-IO/unstructured.git
synced
2025-07-29 11:58:51 +00:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
unstructured
/
example-docs
/
umlauts-non-utf8.md
6 lines
24 B
Markdown
Raw
Normal View
History
Unescape
Escape
chore: switch to charset normalizer (#4060) Closes [SPI-44](https://linear.app/unstructured/issue/SPI-44/spike-replace-chardet-with-charset-normalizer-if-possible). Removes `chardet` as a dependency, standardizing on `charset-normalizer`. This involved: - Changing `chardet` to `charset-normalizer` in our base dependency file - Updating the code (in only one place) where `chardet` was used - pip-compiling to update our published dependency tree - Updating one test... `charset-normalizer` misdiagnosed the encoding of a file used as a test fixture. My guess is that the ~10 characters in the file were not enough for `charset-normalizer` to do a proper inference, so I re-encoded another slightly longer file that's also used for encoding testing, and it got that one. - Updating an ingest test fixture. - Updating the ingest test fixture update workflow to also update the expected markdown results (this was a task I missed when adding the markdown ingest tests) --------- Co-authored-by: Ahmet Melek <39141206+ahmetmeleq@users.noreply.github.com> Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: qued <qued@users.noreply.github.com> Co-authored-by: Maksymilian Operlejn <36171422+MaksOpp@users.noreply.github.com>
2025-07-22 14:02:40 -05:00
## k
<>
nnen
k
<EFBFBD>
nnen
<EFBFBD>
<EFBFBD>
<EFBFBD>
<EFBFBD>
Reference in New Issue
Copy Permalink