---
title: Fetchers
id: fetchers-api
description: Fetches content from a list of URLs and returns a list of extracted content streams.
---
# Module link\_content
## LinkContentFetcher
Fetches and extracts content from URLs.
It supports various content types, retries on failures, and automatic user-agent rotation for failed web
requests. Use it as the data-fetching step in your pipelines.
You may need to convert LinkContentFetcher's output into a list of documents. Use HTMLToDocument
converter to do this.
### Usage example
```python
from haystack.components.fetchers.link_content import LinkContentFetcher
fetcher = LinkContentFetcher()
streams = fetcher.run(urls=["https://www.google.com"])["streams"]
assert len(streams) == 1
assert streams[0].meta == {'content_type': 'text/html', 'url': 'https://www.google.com'}
assert streams[0].data
```
For async usage:
```python
import asyncio
from haystack.components.fetchers import LinkContentFetcher
async def fetch_async():
fetcher = LinkContentFetcher()
result = await fetcher.run_async(urls=["https://www.google.com"])
return result["streams"]
streams = asyncio.run(fetch_async())
```
#### LinkContentFetcher.\_\_init\_\_
```python
def __init__(raise_on_failure: bool = True,
user_agents: Optional[list[str]] = None,
retry_attempts: int = 2,
timeout: int = 3,
http2: bool = False,
client_kwargs: Optional[dict] = None,
request_headers: Optional[dict[str, str]] = None)
```
Initializes the component.
**Arguments**:
- `raise_on_failure`: If `True`, raises an exception if it fails to fetch a single URL.
For multiple URLs, it logs errors and returns the content it successfully fetched.
- `user_agents`: [User agents](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent)
for fetching content. If `None`, a default user agent is used.
- `retry_attempts`: The number of times to retry to fetch the URL's content.
- `timeout`: Timeout in seconds for the request.
- `http2`: Whether to enable HTTP/2 support for requests. Defaults to False.
Requires the 'h2' package to be installed (via `pip install httpx[http2]`).
- `client_kwargs`: Additional keyword arguments to pass to the httpx client.
If `None`, default values are used.
#### LinkContentFetcher.\_\_del\_\_
```python
def __del__()
```
Clean up resources when the component is deleted.
Closes both the synchronous and asynchronous HTTP clients to prevent
resource leaks.
#### LinkContentFetcher.run
```python
@component.output_types(streams=list[ByteStream])
def run(urls: list[str])
```
Fetches content from a list of URLs and returns a list of extracted content streams.
Each content stream is a `ByteStream` object containing the extracted content as binary data.
Each ByteStream object in the returned list corresponds to the contents of a single URL.
The content type of each stream is stored in the metadata of the ByteStream object under
the key "content_type". The URL of the fetched content is stored under the key "url".
**Arguments**:
- `urls`: A list of URLs to fetch content from.
**Raises**:
- `Exception`: If the provided list of URLs contains only a single URL, and `raise_on_failure` is set to
`True`, an exception will be raised in case of an error during content retrieval.
In all other scenarios, any retrieval errors are logged, and a list of successfully retrieved `ByteStream`
objects is returned.
**Returns**:
`ByteStream` objects representing the extracted content.
#### LinkContentFetcher.run\_async
```python
@component.output_types(streams=list[ByteStream])
async def run_async(urls: list[str])
```
Asynchronously fetches content from a list of URLs and returns a list of extracted content streams.
This is the asynchronous version of the `run` method with the same parameters and return values.
**Arguments**:
- `urls`: A list of URLs to fetch content from.
**Returns**:
`ByteStream` objects representing the extracted content.