unstructured/example-docs/fake-html-cp1252.html
qued d83df422a6
chore: switch to charset normalizer (#4060)
Closes
[SPI-44](https://linear.app/unstructured/issue/SPI-44/spike-replace-chardet-with-charset-normalizer-if-possible).

Removes `chardet` as a dependency, standardizing on
`charset-normalizer`.

This involved:
- Changing `chardet` to `charset-normalizer` in our base dependency file
- Updating the code (in only one place) where `chardet` was used
- pip-compiling to update our published dependency tree
- Updating one test... `charset-normalizer` misdiagnosed the encoding of
a file used as a test fixture. My guess is that the ~10 characters in
the file were not enough for `charset-normalizer` to do a proper
inference, so I re-encoded another slightly longer file that's also used
for encoding testing, and it got that one.
- Updating an ingest test fixture.
- Updating the ingest test fixture update workflow to also update the
expected markdown results (this was a task I missed when adding the
markdown ingest tests)

---------

Co-authored-by: Ahmet Melek <39141206+ahmetmeleq@users.noreply.github.com>
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: qued <qued@users.noreply.github.com>
Co-authored-by: Maksymilian Operlejn <36171422+MaksOpp@users.noreply.github.com>
2025-07-22 19:02:40 +00:00

20 lines
901 B
HTML

<!DOCTYPE html>
<html>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
<p>Some text with CP1252-specific characters:</p>
<pre>
Die schöne Frau hat einen Kaffee mit Kuchen gegessen. Sie sagte: "Das war köstlich!" und lächelte dabei. Der Preis betrug 15,50 €.
L'été était très chaud cette année. J'ai acheté un café au lait pour 3,50 €. C'était délicieux ! L'homme a dit : "C'est parfait !"
El niño comió paella con ñoquis. La señora dijo: "¡Qué rico!" y pagó 25,75 €. El restaurante tenía un menú del día.
Kvinnan åt köttbullar med lingonsylt. Hon sa: "Det var fantastiskt!" och betalade 45,90 €. Mannen frågade: "Vill du ha mer?"
O João comprou um café por 2,50 €. Ele disse: "Está ótimo!" e sorriu. A mulher perguntou: "Quer mais alguma coisa?"
De vrouw dronk koffie met koekjes. Ze zei: "Het was heerlijk!" en betaalde 4,25 €. Het kind vroeg: "Mag ik ook wat?"
</pre>
</body>
</html>