2025-10-21 16:37:52 +02:00
|
|
|
---
|
|
|
|
|
title: "Llama.cpp"
|
|
|
|
|
id: integrations-llama-cpp
|
|
|
|
|
description: "Llama.cpp integration for Haystack"
|
|
|
|
|
slug: "/integrations-llama-cpp"
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
<a id="haystack_integrations.components.generators.llama_cpp.generator"></a>
|
|
|
|
|
|
2025-10-21 18:10:10 +02:00
|
|
|
## Module haystack\_integrations.components.generators.llama\_cpp.generator
|
2025-10-21 16:37:52 +02:00
|
|
|
|
|
|
|
|
<a id="haystack_integrations.components.generators.llama_cpp.generator.LlamaCppGenerator"></a>
|
|
|
|
|
|
2025-10-21 18:10:10 +02:00
|
|
|
### LlamaCppGenerator
|
2025-10-21 16:37:52 +02:00
|
|
|
|
|
|
|
|
Provides an interface to generate text using LLM via llama.cpp.
|
|
|
|
|
|
|
|
|
|
[llama.cpp](https://github.com/ggml-org/llama.cpp) is a project written in C/C++ for efficient inference of LLMs.
|
|
|
|
|
It employs the quantized GGUF format, suitable for running these models on standard machines (even without GPUs).
|
|
|
|
|
|
|
|
|
|
Usage example:
|
|
|
|
|
```python
|
|
|
|
|
from haystack_integrations.components.generators.llama_cpp import LlamaCppGenerator
|
|
|
|
|
generator = LlamaCppGenerator(model="zephyr-7b-beta.Q4_0.gguf", n_ctx=2048, n_batch=512)
|
|
|
|
|
|
|
|
|
|
print(generator.run("Who is the best American actor?", generation_kwargs={"max_tokens": 128}))
|
|
|
|
|
# {'replies': ['John Cusack'], 'meta': [{"object": "text_completion", ...}]}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
<a id="haystack_integrations.components.generators.llama_cpp.generator.LlamaCppGenerator.__init__"></a>
|
|
|
|
|
|
|
|
|
|
#### LlamaCppGenerator.\_\_init\_\_
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
def __init__(model: str,
|
|
|
|
|
n_ctx: Optional[int] = 0,
|
|
|
|
|
n_batch: Optional[int] = 512,
|
|
|
|
|
model_kwargs: Optional[Dict[str, Any]] = None,
|
|
|
|
|
generation_kwargs: Optional[Dict[str, Any]] = None)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Arguments**:
|
|
|
|
|
|
|
|
|
|
- `model`: The path of a quantized model for text generation, for example, "zephyr-7b-beta.Q4_0.gguf".
|
|
|
|
|
If the model path is also specified in the `model_kwargs`, this parameter will be ignored.
|
|
|
|
|
- `n_ctx`: The number of tokens in the context. When set to 0, the context will be taken from the model.
|
|
|
|
|
- `n_batch`: Prompt processing maximum batch size.
|
|
|
|
|
- `model_kwargs`: Dictionary containing keyword arguments used to initialize the LLM for text generation.
|
|
|
|
|
These keyword arguments provide fine-grained control over the model loading.
|
|
|
|
|
In case of duplication, these kwargs override `model`, `n_ctx`, and `n_batch` init parameters.
|
|
|
|
|
For more information on the available kwargs, see
|
|
|
|
|
[llama.cpp documentation](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/`llama_cpp.Llama.__init__`).
|
|
|
|
|
- `generation_kwargs`: A dictionary containing keyword arguments to customize text generation.
|
|
|
|
|
For more information on the available kwargs, see
|
|
|
|
|
[llama.cpp documentation](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/`llama_cpp.Llama.create_completion`).
|
|
|
|
|
|
|
|
|
|
<a id="haystack_integrations.components.generators.llama_cpp.generator.LlamaCppGenerator.run"></a>
|
|
|
|
|
|
|
|
|
|
#### LlamaCppGenerator.run
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
@component.output_types(replies=List[str], meta=List[Dict[str, Any]])
|
|
|
|
|
def run(
|
|
|
|
|
prompt: str,
|
|
|
|
|
generation_kwargs: Optional[Dict[str, Any]] = None
|
|
|
|
|
) -> Dict[str, Union[List[str], List[Dict[str, Any]]]]
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Run the text generation model on the given prompt.
|
|
|
|
|
|
|
|
|
|
**Arguments**:
|
|
|
|
|
|
|
|
|
|
- `prompt`: the prompt to be sent to the generative model.
|
|
|
|
|
- `generation_kwargs`: A dictionary containing keyword arguments to customize text generation.
|
|
|
|
|
For more information on the available kwargs, see
|
|
|
|
|
[llama.cpp documentation](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/`llama_cpp.Llama.create_completion`).
|
|
|
|
|
|
|
|
|
|
**Returns**:
|
|
|
|
|
|
|
|
|
|
A dictionary with the following keys:
|
|
|
|
|
- `replies`: the list of replies generated by the model.
|
|
|
|
|
- `meta`: metadata about the request.
|