Magnus F 1e2da6df46
fix: ipv4 address regex (#3808)
I noticed the ipv4 regex is wrong (it only capture one or two-digit
octets, e.g. `n.nn.n.nn`). Here's a correction and a bumped test for it.

If you wish I can break out the ipv4 test to its own case, so we don't
interfere with the existing `EMAIL_META_DATA_INPUT` ipv6 extraction
test.

Side note: The comment at `unstructured/nlp/patterns.py#95` includes a
bad ipv4 address example (last octet is wrongfully left-padded with a
zero). I left it as it is because I'm not sure if the intention is to
include "non-conventional" ipv4 addresses, like octal or hexadecimal
octets.
2024-12-09 14:19:13 -08:00
..