mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-07-28 11:19:58 +00:00

* Rewrite crawler tests (very slow) and fix small crawler bug * Update Documentation & Code Style * compile the regex only once * Factor out the html files & add content check to most tests * Clarify that even starting URLs can be excluded * Update Documentation & Code Style * Change signature * Fix failing test * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
11 lines
223 B
HTML
11 lines
223 B
HTML
<!DOCTYPE html>
|
|
<html>
|
|
<head>
|
|
<title>Test Home Page for Crawler</title>
|
|
</head>
|
|
<body>
|
|
<p>home page content</p>
|
|
<a href="page1.html">link to page 1</a>
|
|
<a href="page2.html">link to page 2</a>
|
|
</body>
|
|
</html> |