mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-29 11:58:51 +00:00

Closes [SPI-44](https://linear.app/unstructured/issue/SPI-44/spike-replace-chardet-with-charset-normalizer-if-possible). Removes `chardet` as a dependency, standardizing on `charset-normalizer`. This involved: - Changing `chardet` to `charset-normalizer` in our base dependency file - Updating the code (in only one place) where `chardet` was used - pip-compiling to update our published dependency tree - Updating one test... `charset-normalizer` misdiagnosed the encoding of a file used as a test fixture. My guess is that the ~10 characters in the file were not enough for `charset-normalizer` to do a proper inference, so I re-encoded another slightly longer file that's also used for encoding testing, and it got that one. - Updating an ingest test fixture. - Updating the ingest test fixture update workflow to also update the expected markdown results (this was a task I missed when adding the markdown ingest tests) --------- Co-authored-by: Ahmet Melek <39141206+ahmetmeleq@users.noreply.github.com> Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: qued <qued@users.noreply.github.com> Co-authored-by: Maksymilian Operlejn <36171422+MaksOpp@users.noreply.github.com>
20 lines
901 B
HTML
20 lines
901 B
HTML
<!DOCTYPE html>
|
|
<html>
|
|
<body>
|
|
|
|
<h1>My First Heading</h1>
|
|
<p>My first paragraph.</p>
|
|
<p>Some text with CP1252-specific characters:</p>
|
|
|
|
<pre>
|
|
Die schöne Frau hat einen Kaffee mit Kuchen gegessen. Sie sagte: "Das war köstlich!" und lächelte dabei. Der Preis betrug 15,50 €.
|
|
L'été était très chaud cette année. J'ai acheté un café au lait pour 3,50 €. C'était délicieux ! L'homme a dit : "C'est parfait !"
|
|
El niño comió paella con ñoquis. La señora dijo: "¡Qué rico!" y pagó 25,75 €. El restaurante tenía un menú del día.
|
|
Kvinnan åt köttbullar med lingonsylt. Hon sa: "Det var fantastiskt!" och betalade 45,90 €. Mannen frågade: "Vill du ha mer?"
|
|
O João comprou um café por 2,50 €. Ele disse: "Está ótimo!" e sorriu. A mulher perguntou: "Quer mais alguma coisa?"
|
|
De vrouw dronk koffie met koekjes. Ze zei: "Het was heerlijk!" en betaalde 4,25 €. Het kind vroeg: "Mag ik ook wat?"
|
|
</pre>
|
|
|
|
</body>
|
|
</html>
|