83 lines
3.4 KiB
Markdown
Raw Normal View History

---
title: "Llama.cpp"
id: integrations-llama-cpp
description: "Llama.cpp integration for Haystack"
slug: "/integrations-llama-cpp"
---
<a id="haystack_integrations.components.generators.llama_cpp.generator"></a>
## Module haystack\_integrations.components.generators.llama\_cpp.generator
<a id="haystack_integrations.components.generators.llama_cpp.generator.LlamaCppGenerator"></a>
### LlamaCppGenerator
Provides an interface to generate text using LLM via llama.cpp.
[llama.cpp](https://github.com/ggml-org/llama.cpp) is a project written in C/C++ for efficient inference of LLMs.
It employs the quantized GGUF format, suitable for running these models on standard machines (even without GPUs).
Usage example:
```python
from haystack_integrations.components.generators.llama_cpp import LlamaCppGenerator
generator = LlamaCppGenerator(model="zephyr-7b-beta.Q4_0.gguf", n_ctx=2048, n_batch=512)
print(generator.run("Who is the best American actor?", generation_kwargs={"max_tokens": 128}))
# {'replies': ['John Cusack'], 'meta': [{"object": "text_completion", ...}]}
```
<a id="haystack_integrations.components.generators.llama_cpp.generator.LlamaCppGenerator.__init__"></a>
#### LlamaCppGenerator.\_\_init\_\_
```python
def __init__(model: str,
n_ctx: Optional[int] = 0,
n_batch: Optional[int] = 512,
model_kwargs: Optional[Dict[str, Any]] = None,
generation_kwargs: Optional[Dict[str, Any]] = None)
```
**Arguments**:
- `model`: The path of a quantized model for text generation, for example, "zephyr-7b-beta.Q4_0.gguf".
If the model path is also specified in the `model_kwargs`, this parameter will be ignored.
- `n_ctx`: The number of tokens in the context. When set to 0, the context will be taken from the model.
- `n_batch`: Prompt processing maximum batch size.
- `model_kwargs`: Dictionary containing keyword arguments used to initialize the LLM for text generation.
These keyword arguments provide fine-grained control over the model loading.
In case of duplication, these kwargs override `model`, `n_ctx`, and `n_batch` init parameters.
For more information on the available kwargs, see
[llama.cpp documentation](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/`llama_cpp.Llama.__init__`).
- `generation_kwargs`: A dictionary containing keyword arguments to customize text generation.
For more information on the available kwargs, see
[llama.cpp documentation](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/`llama_cpp.Llama.create_completion`).
<a id="haystack_integrations.components.generators.llama_cpp.generator.LlamaCppGenerator.run"></a>
#### LlamaCppGenerator.run
```python
@component.output_types(replies=List[str], meta=List[Dict[str, Any]])
def run(
prompt: str,
generation_kwargs: Optional[Dict[str, Any]] = None
) -> Dict[str, Union[List[str], List[Dict[str, Any]]]]
```
Run the text generation model on the given prompt.
**Arguments**:
- `prompt`: the prompt to be sent to the generative model.
- `generation_kwargs`: A dictionary containing keyword arguments to customize text generation.
For more information on the available kwargs, see
[llama.cpp documentation](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/`llama_cpp.Llama.create_completion`).
**Returns**:
A dictionary with the following keys:
- `replies`: the list of replies generated by the model.
- `meta`: metadata about the request.