Daria Fokina c27c8e923d
chore(docs): activate broken link checker in Docusaurus and fix broken links (#9993)
* activate link checker

* additional link fixes

* same fix for different version

* fix anchors

* rename data classes md

* simplify links to reference

* Apply suggestions from code review

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-10-31 15:52:16 +01:00

191 lines
8.1 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Secret Management"
id: secret-management
slug: "/secret-management"
description: "This page emphasizes secret management in Haystack components and introduces the `Secret` type for structured secret handling. It explains the drawbacks of hard-coding secrets in code and suggests using environment variables instead."
---
# Secret Management
This page emphasizes secret management in Haystack components and introduces the `Secret` type for structured secret handling. It explains the drawbacks of hard-coding secrets in code and suggests using environment variables instead.
Many Haystack components interact with third-party frameworks and service providers such as Azure, Google Vertex AI, and OpenAI. Their libraries often require the user to authenticate themselves to ensure they receive access to the underlying product. The authentication process usually works with a secret value that acts as an opaque identifier to the third-party backend.
This page describes the two main types of secrets: token-based and environment variable-based, and how to handle them when using Haystack.
You can find additional details for the `Secret` class in our [API reference](/reference/utils-api).
<details>
<summary>Example Use Case - Problem Statement</summary>
### Problem Statement
Lets consider an example RAG pipeline that embeds a query, uses a Retriever component to locate documents relevant to the query, and then leverages an LLM to generate an answer based on the retrieved documents.
The `OpenAIGenerator` component used in the pipeline below expects an API key to authenticate with OpenAIs servers and perform the generation. Lets assume that the component accepts a `str` value for it:
```python
generator = OpenAIGenerator(model="gpt-4", api_key="sk-xxxxxxxxxxxxxxxxxx")
pipeline.add_component("generator", generator)
```
This works in a pinch, but this is bad practice - we shouldnt hard-code such secrets in the codebase. An alternative would be to store the key in an environment variable externally, read from it in Python, and pass that to the component:
```python
import os
api_key = os.environ.get("OPENAI_API_KEY")
generator = OpenAIGenerator(model="gpt-4", api_key=api_key)
pipeline.add_component("generator", generator)
```
This is better the pipeline works as intended, and we arent hard-coding any secrets in the code.
Remember that pipelines are serializable. Since the API key is a secret, we should definitely avoid saving it to disk. Lets modify the components `to_dict` method to exclude the key:
```python
def to_dict(self) -> Dict[str, Any]:
# Do not pass the `api_key` init parameter.
return default_to_dict(self, model=self.model)
```
But what happens when the pipeline is loaded from disk? In the best-case scenario, the components backend will automatically try to read the key from a hard-coded environment variable, and that key is the same as the one that was passed to the component before it was serialized. But in a worse case, the backend doesnt look up the key in a hard-coded environment variable and fails when it gets called inside a `pipeline.run()` invocation.
</details>
### Import
To use Haystack secrets within the code, first import with:
```python
from haystack.utils import Secret
```
### Token-Based Secrets
You can paste tokens directly as a string using the `from_token` method:
```python
llm = OpenAIGenerator(api_key=Secret.from_token("sk-randomAPIkeyasdsa32ekasd32e"))
```
Note that this type of code cannot be serialized, meaning you can't convert the above component to a dictionary or save a pipeline containing it to a YAML file. This is a security feature to prevent accidental exposure of sensitive data.
### Environment Variable-Based Secrets
Environment variable-based secrets are more flexible. They allow you to specify one or more environment variables that may contain your secret.
Existing Haystack components that require an API Key (like OpenAIGenerator) have a default value for `Secret.from_env_var` (in this case, `OPENAI_API_KEY`). This means that the `OpenAIGenerator` will look for the value of the environment variable `OPENAI_API_KEY` (if it exists) and use it for authentication. And when pipelines are serialized to YAML, only the name of the environment variable is save to the YAML file. In doing so, this method ensures that there are no security leaks and is therefore strongly recommended.
```bash
## First, export an environment variable name `OPENAI_API_KEY` with its value
export OPENAI_API_KEY=sk-randomAPIkeyasdsa32ekasd32e
## or alternatively, using Python
## import os
## os.environ[”OPENAI_API_KEY”]=sk-randomAPIkeyasdsa32ekasd32e
```
```python
llm_generator = OpenAIGenerator() # Uses the default value from the env var for the component
```
Alternatively, in components where a Secret is expected, you can customize the name of the environment variable from which the API Key is to be read.
```python
## Export an environment variable with custom name and its value
llm_generator = OpenAIGenerator(api_key=Secret.from_env_var("YOUR_ENV_VAR"))
```
When `OpenAIGenerator` is serialized within a pipeline, this is what the YAML code will look like, using the custom variable name:
```yaml
components:
llm:
init_parameters:
api_base_url: null
api_key:
env_vars:
- YOUR_ENV_VAR
strict: true
type: env_var
generation_kwargs: {}
model: gpt-4o-mini
organization: null
streaming_callback: null
system_prompt: null
type: haystack.components.generators.openai.OpenAIGenerator
...
```
### Serialization
While token-based secrets cannot be serialized, environment variable-based secrets can be converted to and from dictionaries:
```python
## Convert to dictionary
env_secret_dict = env_secret.to_dict()
## Create from dictionary
new_env_secret = Secret.from_dict(env_secret_dict)
```
### Resolving Secrets
Both types of secrets can be resolved to their actual values using the `resolve_value` method. This method returns the token or the value of the environment variable.
```python
## Resolve the token-based secret
token_value = api_key_secret.resolve_value()
## Resolve the environment variable-based secret
env_value = env_secret.resolve_value()
```
### Custom Component Example
Here is a complete example that shows how to create a component that uses the `Secret` class in Haystack, highlighting the differences between token-based and environment variable-based authentication, and showing that token-based secrets cannot be serialized:
```python
from haystack.utils import Secret, deserialize_secrets_inplace
@component
class MyComponent:
def __init__(self, api_key: Optional[Secret] = None, **kwargs):
self.api_key = api_key
self.backend = None
def warm_up(self):
# Call resolve_value to yield a single result. The semantics of the result is policy-dependent.
# Currently, all supported policies will return a single string token.
self.backend = SomeBackend(api_key=self.api_key.resolve_value() if self.api_key else None, ...)
def to_dict(self):
# Serialize the policy like any other (custom) data. If the policy is token-based, it will
# raise an error.
return default_to_dict(self, api_key=self.api_key.to_dict() if self.api_key else None, ...)
@classmethod
def from_dict(cls, data):
# Deserialize the policy data before passing it to the generic from_dict function.
api_key_data = data["init_parameters"]["api_key"]
api_key = Secret.from_dict(api_key_data) if api_key_data is not None else None
data["init_parameters"]["api_key"] = api_key
# Alternatively, use the helper function.
# deserialize_secrets_inplace(data["init_parameters"], keys=["api_key"])
return default_from_dict(cls, data)
## No authentication.
component = MyComponent(api_key=None)
## Token based authentication
component = MyComponent(api_key=Secret.from_token("sk-randomAPIkeyasdsa32ekasd32e"))
component.to_dict() # Error! Can't serialize authentication tokens
## Environment variable based authentication
component = MyComponent(api_key=Secret.from_env_var("OPENAI_API_KEY"))
component.to_dict() # This is fine
```